ArticlePDF Available

Global, in situ analysis of the structural proteome in individuals with Parkinson’s disease to identify a new class of biomarker

Authors:

Abstract and Figures

Parkinson’s disease (PD) is a prevalent neurodegenerative disease for which robust biomarkers are needed. Because protein structure reflects function, we tested whether global, in situ analysis of protein structural changes provides insight into PD pathophysiology and could inform a new concept of structural disease biomarkers. Using limited proteolysis–mass spectrometry (LiP–MS), we identified 76 structurally altered proteins in cerebrospinal fluid (CSF) of individuals with PD relative to healthy donors. These proteins were enriched in processes misregulated in PD, and some proteins also showed structural changes in PD brain samples. CSF protein structural information outperformed abundance information in discriminating between healthy participants and those with PD and improved the discriminatory performance of CSF measures of the hallmark PD protein α-synuclein. We also present the first analysis of inter-individual variability of a structural proteome in healthy individuals, identifying biophysical features of variable protein regions. Although independent validation is needed, our data suggest that global analyses of the human structural proteome will guide the development of novel structural biomarkers of disease and enable hypothesis generation about underlying disease processes.
Structural variability of the proteome in healthy human CSF a, The s.d. of LiP peptide intensities against the s.d. of trypsin-only peptide intensities. Each point represents a single peptide. Cut-offs for highly variable peptides (red line) and medium-variable peptides (orange line) are shown. b, Volcano plot showing the distribution of variability scores. Each point represents a single peptide. Red, highly variable peptides; orange, medium-variable peptides; gray, non-variable peptides. P values were estimated using Levene’s test. c, SCRIBER scores for the indicated peptide classes. Box plots in all panels: median, center; first and third quantile, lower and upper hinges; largest/smallest value no further than 1.5 × inter-quantile range of the hinge, whiskers; data points beyond are defined as outliers and plotted individually. P values are indicated (Wilcoxon rank-sum test; n = 9,385 non-variable, 386 medium-variable, 117 high-variable peptides, from 51 participants). d, Disopred3 disorder scores for the indicated peptide classes. Each point represents a single peptide. P values are indicated (Fisher’s exact test; n values are as in c). e, Disopred3 scores of highly variable peptides, comparing those with a bimodal distribution and those with a non-bimodal distribution of LiP intensities. P values are indicated (Fisher’s exact test; n = 102 non-bimodal, 15 bimodal peptides from 51 participant). f, Distribution of the LiP intensities (log2) versus trypsin-only intensities (log2) for the indicated peptide (ALQASALK) from fructose bisphosphate aldolase. Each point represents one participant. g, Structure of human brain fructose bisphosphate aldolase (PDB 1XFB) showing one subunit of the homotetramer in light blue (the other three subunits are shown as gray surface). Yellow spheres represent the substrate, based on alignment of PDB 1XFB with the ligand-bound muscle isoform (PDB 4ALD). Highly variable peptides in dark red (bimodal peptide) and salmon (unimodal peptide). h, Distribution of LiP intensities (log2) versus trypsin-only intensities (log2) for a highly variable peptide (SVLQSINPAEPHK) from semaphorin 7A. Each point represents one participant. i, Structure of semaphorin 7A (light blue), in complex with Plexin C1 (gray, PDB 3NVQ). Non-variable peptides are in dark blue, highly variable peptide from f are in red, and medium-variable peptides are in orange. Source data
… 
Identification of proteome structural variations between the healthy and PD cohort groups a, Data analysis workflow to identify structural peptide variation, PK-independent peptide variation, and protein abundance variation in the CSF between the cohort groups (β = coefficient(s) of the linear model, ~ = equates.). b, Histogram showing the results of the analysis of the CSF data, visualizing the P values of the cohort variables estimated via t statistics from the coefficients of three different types of linear models, indicated by color. Effects based on structural variations (blue), PK-independent peptide variations (light green), and protein abundance variations (dark green) are shown. For all models, the first bar (extreme left) indicates significant (<0.05) P values. c, Histogram showing the results of the analysis of the CSF data, visualizing the P values of the age, cohort, and sex variables, indicated by color, estimated from the coefficients of the structural variations model. Effects based on cohort membership (blue), age (magenta), and sex (yellow) are plotted. Significant P values are as in b. d, Coefficients of each peptide, reflecting cohort membership in the linear model assessing structural variation (compare with the blue histograms in b and c), are plotted against their corresponding P value after applying protein-wise false-discovery rate correction (Benjamini–Hochberg procedure). Each point represents a single peptide; blue indicates candidate biomarker peptides that vary between the healthy and PD cohort groups in the CSF. e, GO term analysis of the proteins corresponding to the candidate biomarker peptides in d. The enrichment is computed relative to all proteins included in the data analysis. f, Histogram corresponding to the analysis of the brain data visualizing the P values of the cohort variables estimated via t statistics from the coefficients of three different types of linear model, indicated by color. Color and significant P values are as in b. Source data
… 
Structural changes in selected proteins that are altered between healthy individuals and those with PD a, Peptide coverage plot for NrCAM in CSF (top) and brain (bottom). Black indicates all analyzed peptides; red indicates significantly altered structural peptides between PD and healthy samples, after correction for covariates. Half-tryptic peptides were not visualized. Ig-like (dark gray) and fibronectin type-lll domains (light gray) are highlighted. b,c, AlphaFold-predicted structure of PITH1, colored according to peptides in the CSF (b) and brain (c) data. Peptide colors are as in a. d, Peptide coverage plot for PITH1 in CSF (top) and brain (bottom). Peptide colors as in a; half-tryptic peptides in gray. e,f, Structure (e) and peptide coverage plot (f) of homotrimeric CBLN1 (PDB 5KC6), colored as in a, according to peptides in CSF data; half-tryptic peptides are in gray (f). g, Scaled residuals for healthy individuals and for those with PD for the indicated peptide of CBLN1. Box plots: median, center; first and third quantile, lower and upper hinges; largest/smallest value no further than 1.5 × inter-quantile range of the hinge, whiskers; data beyond are as outliers and are plotted individually (n = 51 participants in HG and n = 49 participants in PDG). h, Structure of vitamin-D-binding protein GC (PDB 1J78) with bound vitamin D (yellow spheres), colored as in a, according to peptides from in situ CSF data. i, Peptide coverage plot for GC in CSF. Colors are as in d. The light blue box indicates the protein region of the altered structural peptide. j, Scaled residual for the indicated peptide of GC, as in g (n = 51 participants each in HG and PDG). k, Structure of the vitamin-D-binding protein (as in h), colored according to the peptides that change upon substrate (calcitriol) addition in vitro. Red, significantly changed peptides (adjusted P values < 0.05, log2(fold change) < −1 or > 1, P values estimated with a two-sided t-test); white, detected but unchanged peptides. The blue circle encloses the peptide overlapping with the in situ-identified altered peptide. l, Coverage plot for significant peptides from k. Source data
… 
This content is subject to copyright. Terms and conditions apply.
Nature Structural & Molecular Biology
nature structural & molecular biology
https://doi.org/10.1038/s41594-022-00837-0Article
Global, in situ analysis of the structural
proteome in individuals with Parkinson’s
disease to identify a new class of biomarker
Marie-Therese Mackmull 1,9, Luise Nagel2,9, Fabian Sesterhenn1, Jan Muntel 3,
Jan Grossbach 2, Patrick Stalder1, Roland Bruderer3, Lukas Reiter3,
Wilma D. J. van de Berg4,5, Natalie de Souza1,6, Andreas Beyer 2,7,8 and
Paola Picotti 1
Parkinson’s disease (PD) is a prevalent neurodegenerative disease for
which robust biomarkers are needed. Because protein structure reects
function, we tested whether global, in situ analysis of protein structural
changes provides insight into PD pathophysiology and could inform a new
concept of structural disease biomarkers. Using limited proteolysis–mass
spectrometry (LiP–MS), we identied 76 structurally altered proteins
in cerebrospinal uid (CSF) of individuals with PD relative to healthy
donors. These proteins were enriched in processes misregulated in PD,
and some proteins also showed structural changes in PD brain samples.
CSF protein structural information outperformed abundance information
in discriminating between healthy participants and those with PD and
improved the discriminatory performance of CSF measures of the hallmark
PD protein α-synuclein. We also present the rst analysis of inter-individual
variability of a structural proteome in healthy individuals, identifying
biophysical features of variable protein regions. Although independent
validation is needed, our data suggest that global analyses of the human
structural proteome will guide the development of novel structural
biomarkers of disease and enable hypothesis generation about underlying
disease processes.
The rise of chronic diseases in aging populations necessitates a deeper
understanding of disease pathophysiology and robust biomarkers for
early disease detection and stratification of individuals with disease1.
Studies of disease by MS-based proteomics typically identify vary-
ing protein abundances, post-translational modifications (PTMs), or
isoform prevalence
26
. These global analyses are, however, blind to
many molecular events that affect protein function, such as the bind-
ing of small molecules, protein-protein interactions, misfolding, and
protein conformational changes. Reasoning that most molecular events
that affect protein function would affect protein structure7,8, we tested
Received: 30 April 2021
Accepted: 18 August 2022
Published online: xx xx xxxx
Check for updates
1Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland. 2Cluster of Excellence Cellular Stress Responses in
Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany. 3Biognosys AG, Schlieren, Switzerland. 4Amsterdam UMC location
Vrije Universiteit Amsterdam, Section Clinical Neuroanatomy and Biobanking, Department Anatomy and Neurosciences, Amsterdam, the Netherlands.
5Amsterdam Neuroscience, Neurodegeneration, Amsterdam, the Netherlands. 6Department of Quantitative Biomedicine, University of Zurich, Zurich,
Switzerland. 7Faculty of Medicine and University Hospital of Cologne, and Center for Molecular Medicine Cologne, University of Cologne, Cologne,
Germany. 8Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany. 9These authors contributed
equally: Marie-Therese Mackmull, Luise Nagel. e-mail: andreas.beyer@uni-koeln.de; picotti@imsb.biol.ethz.ch
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
proteome of individuals with PD relative to that of healthy individuals.
Our cohort consists of 52 individuals diagnosed with sporadic PD (PD
group, or PDG), with an average disease duration of 5.8 years, and 51
healthy individuals (HG) (Extended Data Fig. 1a) covering the same age
range. Both sexes are represented in each cohort group (Extended Data
Fig. 1b). Quantitative clinical parameters were available for the PDG,
and levels of previously identified PD-associated biochemical markers
(levels of total (t-α-Syn), phosphorylated (p-α-Syn), and oligomeric
α-synuclein (o-α-Syn)) had been previously collected for the complete
cohort (Extended Data Fig. 1a)17,1922,24,25.
CSF samples were collected by lumbar puncture19 under
near-native conditions and were subjected to LiP–MS (Fig. 1a). Each
sample was split into two; one aliquot underwent limited proteolysis
(LiP) followed by trypsin digestion prior to MS, and the other under-
went only trypsin digestion (trypsin-only). We used the trypsin-only
data to measure protein-abundance changes and peptide-level changes
that could be due to covalent modifications such as PTMs or peptide
sequence changes from alternative splicing, RNA editing, endogenous
proteolysis, or mutations. The LiP data were used to identify alterations
in protein structure.
We used a data-independent acquisition (DIA) strategy com-
bined with high-quality spectral libraries and label-free quantifica-
tion to monitor protein abundance and structural changes across
the cohort. We identified more than 2,100 proteins and more than
48,000 peptides for both LiP and trypsin-only data before filtering
(Supplementary Tables 1 and 2), consistent with previous studies26
but without depletion of abundant proteins. The number of peptides
and proteins at various processing and analysis steps is shown in
Supplementary Table 3. α-Synuclein was only sparsely detected
owing to low endogenous levels, consistent with previous prot-
eomic data27, and was therefore not included in our analyses. Gene
Ontology (GO) enrichment analysis of all identified proteins showed
the enrichment of gene sets expected in the CSF (Extended Data
Fig. 1c)
28
. We used these data to study the variability of the structural
proteome between healthy individuals and to identify structural
biomarker candidates by comparing healthy individuals and those
with disease (Fig. 1b).
Variability of the healthy human cerebrospinal fluid structural
proteome
We first assessed the variability of protein structures in the CSF of the
52 healthy participants. To disentangle changes in LiP peptide intensi-
ties due to structural variation from those due to protein abundance
variation or to peptide-level variation caused by covalent modifica-
tion, cleavage, or sequence changes, hereafter collectively termed
proteinase K (PK)-independent peptide variation, we first compared
the s.d. of LiP peptide intensities with that of the trypsin-only peptide
intensities across the healthy samples (Fig. 2a). We then estimated a
variability score for each peptide, which provides a measure of varia-
tion in LiP peptide intensities due to structural variation, corrected for
PK-independent variation. The strong increase in positive variability
scores indicates that there is a signal for protein structural variability
in addition to protein abundance variability (Fig. 2b).
We classified 117 peptides as structurally highly variable (1.2%,
variability score > 1 with P<1 × 10
−5
), 386 peptides as medium-variable
(3.9%, variability score of 0.5–1 with P<1 × 10
−5
), and 9,385 peptides
(all others), derived from 994 unique proteins, as non-variable across
healthy individuals (Supplementary Tables 4 and 5). The 117 highly
variable peptides originated from 64 unique proteins, suggesting that
~6.5% of the detected CSF proteins show at least one structurally highly
variable region across healthy individuals, in addition to any changes
due to covalent modifications, cleavage, or sequence alterations, which
were removed in this analysis. Functional enrichment analysis of these
structurally variable proteins identified terms across a range of cellular
and neuronal functions.
whether global analyses of the human structural proteome reflected
pathological alterations and whether this could yield a novel type of
structural biomarker of disease.
To detect protein structural alterations directly in biofluids of partici-
pants in clinical cohorts, we have used LiP–MS, our previously developed
structural-proteomics approach. LiP–MS probes structural alterations
on a proteome-wide scale in complex, near-native contexts9,10, and cap-
tures a variety of functional alterations (for example, allostery, enzyme
activation, active-site occupancy, chemical modification, aggregation)
with the resolution of single functional sites
11
. In brief, LiP–MS uses a
non-specific protease to generate structure-specific proteolytic patterns
that can be analyzed by MS.
We used LiP–MS to identify structural alterations in the CSF of indi-
viduals with Parkinson’s disease. PD affects around 1% of the world popula-
tion over age 60, lacks early diagnostic biomarkers, and presently cannot
be cured. Additionally, PD is a prominent example of a disease associated
with altered protein structure. The protein α-synuclein plays a central role
in PD pathology and forms oligomers and proteinaceous deposits, called
Lewy bodies, in the brain1214. Transcriptomics and classical proteomics
studies report relatively sparse changes in body fluids of people with PD,
some of which are non-specific to PD15,16. Previous studies have identified
oligomeric and aggregated states of α-synuclein and altered activation
states of endolysosomal enzymes in the CSF of people with PD, which
show potential as PD protein markers1719, in support of our hypothesis
that analysis of protein structures might be informative about the disease
and may lead to biomarker identification. Altogether, given the lack of
early biomarkers and the potential for structural changes accompany-
ing disease onset, we chose PD to test whether global protein structural
analysis can identify potential biomarkers.
We applied LiP–MS to CSF from a well-characterized cohort of
individuals with early-diagnosed PD relative to age-matched healthy
donors
17,1922
. CSF is in constant exchange with brain tissue and therefore
may harbor brain-derived structurally altered proteins. We used a sta-
tistical model to account for age, sex, protein abundance, and technical
confounding factors, and identified 76 proteins that were structurally
altered in individuals with PD relative to healthy donors. These candidate
structural biomarkers were enriched for processes that are known to
be misregulated in PD, such as synapse maintenance and acetylcholine
metabolism. Notably, a combinatorial subset of structurally informative
CSF peptides distinguished individuals with PD from healthy participants
with better performance than protein abundance data, and provided
complementary information to measures of the PD-associated protein
α-synuclein23. Further, more than half of the detected proteins with struc-
tural changes in the CSF were also structurally changed in brain samples
from a small, independent cohort, indicating that the approach has the
potential to capture pathological processes; several of these proteins
have been previously linked to PD. Lastly, analysis of the inter-individual
variability of the human structural proteome using data from healthy
individuals showed that more than 60 human CSF proteins had regions
with high variability between individuals. Variable regions were more
disordered, were more accessible to solvents, and had a higher propensity
for mediating protein-protein interactions than non-variable regions.
Taken together, our findings identify structurally altered proteins
that distinguish individuals with PD from healthy donors. A subset of
these proteins might inform on disease mechanisms in the brain and
help to define the molecular profile of PD, although clinical validity will
require testing in larger independent cohorts. We propose that protein
structures have great potential to reveal pathological processes in PD
and could serve in the future as a novel type of biomarker of disease.
Results
Identification of structural changes in human cerebrospinal
fluid
To probe the concept of structural biomarkers of disease, we systemati-
cally investigated whether structural changes are detectable in the CSF
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Regions associated with highly variable peptides had a slightly
lower propensity to form beta strands and a higher propensity to form
alpha-helices than did non-variable regions, but had the same propen-
sity to be part of loops (Extended Data Fig. 2a). All variable peptides
had a higher predicted propensity to bind other proteins (Fig. 2c)
29
, but
not nucleic acids (Extended Data Fig. 2b,c), consistent with their larger
solvent-accessible surface area (Extended Data Fig. 2d). Proteins with
at least one highly variable peptide showed more physical and func-
tional high-confidence interactions (STRING confidence score > 0.9,
Extended Data Fig. 2e)
30
, but did not differ in their sequence length
or number of domains compared with all other analyzed proteins
(Extended Data Fig. 2f,g). Variable protein regions showed significantly
higher predicted disorder than non-variable regions (Fig. 2d)30, provid-
ing a structural rationale for their variability in situ. We also noted that
15 of the 117 highly variable peptides showed a bimodal distribution
of their LiP intensities (for example, fructose bisphosphate aldolase,
Fig. 2f); the remaining peptides showed a unimodal distribution (for
example, semaphorin 7A, Fig. 2h) (Supplementary Table 5). These
bimodal variable peptides did not show elevated levels of disorder
(Fig. 2e). The bimodal distributions potentially reflect two structural
states of the corresponding protein, for example, the ligand-bound
and ligand-unbound state, and within an individual the protein would
exist predominantly in one state. For example, fructose-bisphosphate
aldolase showed two highly variable peptides, one bimodal and the
other unimodal, in both the healthy and PD groups (Fig. 2f). Although
the bimodal peptide mapped close to the enzyme active site (4.8Å)
(Fig. 2g), LiP–MS analysis of the in vitro enzyme suggested that the
bimodal distribution does not reflect the substrate-bound versus
unbound state (Extended Data Fig. 2h and Supplementary Table 6).
The in situ-measured unimodal peptide, however, did significantly
change upon substrate addition in vitro, so inter-individual variability
at this site could reflect different levels of substrate occupancy. The
variable unimodal peptide on semaphorin 7A (Fig. 2h) mapped to an
unstructured region (Fig. 2i).
In summary, our structural approach enabled the unprecedented
in situ identification and global analysis of CSF proteins with structur
-
ally varying regions across healthy individuals. Predicted disorder,
surface accessibility, and the propensity for protein-protein interac-
tions are associated with high baseline structural variability. Note that
similar variation might be detectable for the same individual over time.
Structural proteomic changes in cerebrospinal fluid of people
with Parkinson’s disease
We went on to probe differences between CSF proteins of healthy indi-
viduals and those with PD. Given the complexity of human cohorts,
robust identification of disease-specific changes requires account
-
ing for several covariates. We designed a data analysis pipeline to
achieve this (Fig. 3a). We performed multiple linear regression, fit-
ting independent models to predict three types of data: LiP peptide
intensities (Supplementary Tables 4 and 7), which probe structural
variation, trypsin-only peptide intensities (Supplementary Tables 4
and 8), which indicate PK-independent peptide variation, and protein
abundance variation, estimated from the trypsin-only samples (Sup-
plementary Tables 4 and 9). In addition to cohort group membership
(HG versus PDG), we included variables representing age and sex for
each sample, and the two-way interactions of all the three demographic
variables. Furthermore, we corrected for technical covariates, such as
total protein concentration measured before sample preparation, and
batch effects.
We modeled the intensities of LiP peptides, trypsin-only pep-
tides, and protein abundance as a function of the demographic and
technical covariates described above, to correct for these covari-
ates. In the case of LiP peptides, we also included trypsin-only pep-
tides and protein abundance as additional covariates to correct for
LiP (PK)
Trypsin
Trypsin
ba
A B D
60
70
80
90
C
–3 –2 –1 0
Individuals
Peptide count
123
0
1
2
3
4
5
log2 (fold change)
log10(P value)
Structural
changes
Healthy
group
PD
group
Abundance
changes
Healthy
group
Healthy
group
PD
group
Variability of the
structural proteome
Biomarker
candidates
Proteomic data
- Abundance changes
- Structural changes
Proteomic data
- Abundance changes
- Structural changes
Demographic data
Clinical data
Properties of highly
variable peptides
- Secondary structure
- Intrinsic disorder
- Binding interface
- Interaction
- Structural biomarkers
- Classification of PD
(± α-synuclein measurements)
Fig. 1 | Schematic overview of the study. a, Schematic overview of the
experimental pipeline for probing structural changes in the CSF of healthy and
PD cohort groups. b, Overview of the data analysis pipeline for the identification
of variable peptides in the CSF of healthy individuals, and for the identification of
structural biomarker peptides that vary between healthy individuals and those
with PD.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
intensity variation that was not due to peptide accessibility changes;
this would identify LiP peptide intensity changes due only to protein
structural alterations. Likewise, we corrected trypsin-only peptide
intensities for protein abundance to distinguish PK-independent
peptide-specific effects (for example, PTMs) from whole-protein
abundance changes. Finally, we statistically tested the coefficients
of all three models to determine significant effects of the cohort,
age, and sex variables.
a b
f
h
d
ig
e
c
Variability score
–log10(P value)
Variability Variability PPI propensity
Non-
variable
Medium
variable
Highly
variable
SCRIBER score
Fructose-bisphosphate aldolase
22
Fructose-bisphosphate
aldolase
ALQASALK
LiP intensities (a.u.)
18
16
14
12
10
1614 18 20
Trypsin-only intensities (a.u.)
LiP intensities (s.d.)
Semaphorin 7A
Disorder/bimodality
1.00
0.75
0.50
0.25
0
Disopred3 score
Non-
bimodal
BimodalNon-
variable
Medium
variable
Highly
variable
1.00
3
2
1
15
10
5
0
0.6
*8.3 × 10–14
*6.7 × 10–7
*<2.2 × 10–16 *0.00896
*8.4 × 10–11
0.4
0.2
0
0 1 2 3 –3 –2 0 1 2–1
0.75
0.50
0.25
0
Disopred3 score
Trypsin-only intensities (s.d.)
LiP intensities (a.u.)
18
16
14
12
10
1816 20 22 24
Trypsin-only intensities (a.u.)
Semaphorin 7A
SVLQSINPAEPHK
Cohort
HG
PDG
Cohort
HG
PDG
Fig. 2 | Structural variability of the proteome in healthy human CSF. a, The s.d.
of LiP peptide intensities against the s.d. of trypsin-only peptide intensities. Each
point represents a single peptide. Cut-offs for highly variable peptides (red line)
and medium-variable peptides (orange line) are shown. b, Volcano plot showing
the distribution of variability scores. Each point represents a single peptide. Red,
highly variable peptides; orange, medium-variable peptides; gray, non-variable
peptides. P values were estimated using Levene’s test. c, SCRIBER scores for the
indicated peptide classes. Box plots in all panels: median, center; first and third
quantile, lower and upper hinges; largest/smallest value no further than 1.5 ×
inter-quantile range of the hinge, whiskers; data points beyond are defined as
outliers and plotted individually. P values are indicated (Wilcoxon rank-sum test;
n=9,385 non-variable, 386 medium-variable, 117 high-variable peptides, from
51 participants). d, Disopred3 disorder scores for the indicated peptide classes.
Each point represents a single peptide. P values are indicated (Fisher’s exact test;
n values are as in c). e, Disopred3 scores of highly variable peptides, comparing
those with a bimodal distribution and those with a non-bimodal distribution of
LiP intensities. P values are indicated (Fisher’s exact test; n=102 non-bimodal,
15 bimodal peptides from 51 participant). f, Distribution of the LiP intensities
(log2) versus trypsin-only intensities (log2) for the indicated peptide (ALQASALK)
from fructose bisphosphate aldolase. Each point represents one participant.
g, Structure of human brain fructose bisphosphate aldolase (PDB 1XFB) showing
one subunit of the homotetramer in light blue (the other three subunits are
shown as gray surface). Yellow spheres represent the substrate, based on
alignment of PDB 1XFB with the ligand-bound muscle isoform (PDB 4ALD).
Highly variable peptides in dark red (bimodal peptide) and salmon (unimodal
peptide). h, Distribution of LiP intensities (log2) versus trypsin-only intensities
(log2) for a highly variable peptide (SVLQSINPAEPHK) from semaphorin 7A.
Each point represents one participant. i, Structure of semaphorin 7A (light blue),
in complex with Plexin C1 (gray, PDB 3NVQ). Non-variable peptides are in dark
blue, highly variable peptide from f are in red, and medium-variable peptides
are in orange.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
We analyzed the distributions of P values of the cohort effect for
the three types of models (Fig. 3b) and found that the P value distribu-
tion for structural variation, but not for PK-independent effects or
protein abundance, had more small P values for the cohort effect. This
suggests that structural changes correlated more strongly with PD state
than did the other two measures.
We similarly asked whether the distribution of the P values for age
or sex showed a signal for structural variation. There was no increase
of small P values for the sex variable, indicating that we did not detect
significant protein structural differences between the CSF of men and
women (Fig. 3c and Extended Data Fig. 3a). In contrast, there was an
increase in the count of LiP peptides with small P values in the distribu-
tions representing age, indicating that we detected protein structural
changes in CSF upon aging.
To identify potential structural biomarkers of disease, we focused
on changes between the healthy and disease cohort groups. After cor-
recting for a potential bias for large proteins (Methods), we selected all
peptides with a significant cohort effect (corrected P values < 0.05) as
candidate structural biomarkers, yielding 88 LiP peptides correspond-
ing to 76 proteins (out of 859 proteins and 7,144 peptides included
in the data analysis) (Fig. 3d and Supplementary Table 10). Two of
these 76 structurally altered proteins in the CSF of individuals with
PD, the alpha-1 (III) chain of collagen (COL3A1) and a peptidyl-glycine
alpha-amidating monooxygenase (PAM), correspond to PD-linked
genes from genome-wide association studies3133. GO enrichment analy-
sis on the structurally altered proteins yielded terms linked to known PD
mechanisms (Fig. 3e), including, for example, several synapse-related
terms, which could reflect the synaptic dysfunction and loss associ-
ated with numerous motor and non-motor symptoms of PD
3442
. We
note that 87/88 of the peptides indicating structural alterations in
PD were classified as non-variable in our analysis of healthy CSF and
are therefore not likely to be confounded by background variability.
Using a data-analysis method designed to correct for multiple
covariates, we have identified 76 structurally altered proteins between
the CSF of healthy individuals and individuals with PD.
Structural proteomic changes in brains of individuals with
Parkinson’s disease
To assess whether the detected structural changes in CSF reflect patho-
logical events in the brain, we applied our approach (see Methods for
analytical differences) to postmortem temporal cortex samples from
a small independent cohort (5 individuals with clinically defined and
pathologically confirmed PD, 5 age-matched healthy individuals, all
female) and asked whether proteins with structural changes in the PD
CSF have those changes in the brain as well. As expected, the overlap of
detected proteins between the two datasets was low because the CSF
has many proteins with extracellular functions (Extended Data Fig. 3b).
Out of the 76 proteins with a structural change in CSF, we detected 27
proteins in the brain data. Strikingly, 16 of these 27 proteins showed
a structural change in both the brains and CSF of individuals with PD,
relative to healthy individuals. At the peptide level, 11 peptides with
a structural change in PD CSF compared with healthy CSF were also
present in the brain dataset. Most of these structural peptides (8/11)
changed accessibility in the same direction (more versus less accessi-
ble) in the brain and CSF (Extended Data Fig. 3c), indicating that these
structural changes in peptides in PD were conserved between the CSF
and brain. Notably, as in the CSF, peptides with structural changes
in the PD brain were more strongly correlated with the disease state
than were peptides indicating protein abundance changes or other
PK-independent peptide changes (Fig. 3f). This confirms that a global
structural analysis is a powerful approach to characterize disease states,
and that we could validate structural changes detected in the PD CSF
in the brains of individuals with PD.
Three proteins that showed structural changes in the PD brain
corresponded to PD-linked genes in genome-wide association studies:
AP2-associated protein kinase 1 (AAK1)43, CYFIP-related Rac1 interac-
tor B (CYRIB)32, and apolipoprotein E (APOE)44,45, the last of which is a
known component of Lewy bodies in PD brains46,47. Unlike in the CSF,
α-synuclein is detected in the brain data, but we did not detect a sta-
tistically significant structural change in α-synuclein, due to a strong
outlier individual in the PD group of this small dataset.
Structural alterations of proteins in Parkinson’s disease
cerebrospinal fluid and brain are linked to disease
We examined specific structurally altered proteins in more detail, prob-
ing their disease link and asking whether structurally altered peptides
map to functionally informative sites on the protein structure. We
focused initially on proteins that changed structurally in both PD CSF
and brain, as these are more likely to be linked to pathological processes.
We identified 16 such proteins: almost all (15/16) have been previously
associated with PD, and more than half of the proteins show clear links
to disease (Supplementary Table 11). For instance, this group includes
proteins that regulate neurotransmitter signaling, including that of
dopamine (synaptotagmin 1)
48,49
, and nicotinic acetylcholine receptors
(lymphocyte antigen 6H)
50
, cause PD-like phenotypes in animal models
(stathmin 2)
51
, or are components of Lewy bodies (peroxiredoxin-6)
52
.
The neuronal cell adhesion molecule (NrCAM) and the PITH
domain-containing protein (PITHD1) had reasonable sequence cov-
erage. In the brain and CSF, NrCAM had structural alterations in similar
regions (NrCAM, Fig. 4a). For PITHD1, the peptides were identical
between the brain and CSF (Fig. 4b–d). In the brain and CSF, both
proteins showed accessibility changes in the same direction, further
supporting that the structural changes in PD were preserved between
these samples. NrCAM, an important protein in neuronal development,
is transcriptionally upregulated in the substantia nigra of people with
PD
53
, PITHD1 is linked to synaptic functions in the olfactory bulb
54,55
,
and olfactory dysfunctions are early signs of PD. For the Lewy body
component PRDX6 52, which has been linked to the PINK1–parkin path-
way
56
and worsens dopaminergic neurodegeneration in a PD mouse
model57, we observed a single changed peptide in each of the brain and
CSF samples (Extended Data Fig. 4d–f).
We additionally analyzed structurally altered CSF proteins with
good sequence coverage and for which a three-dimensional structure
was available. For cerebellin-1 (CBLN1) (Fig. 4e–g), the single altered
Fig. 3 | Identification of proteome structural variations between the healthy
and PD cohort groups. a, Data analysis workflow to identify structural peptide
variation, PK-independent peptide variation, and protein abundance variation
in the CSF between the cohort groups (β = coefficient(s) of the linear model,
~ = equates.). b, Histogram showing the results of the analysis of the CSF data,
visualizing the P values of the cohort variables estimated via t statistics from the
coefficients of three different types of linear models, indicated by color. Effects
based on structural variations (blue), PK-independent peptide variations (light
green), and protein abundance variations (dark green) are shown. For all models,
the first bar (extreme left) indicates significant (<0.05) P values. c, Histogram
showing the results of the analysis of the CSF data, visualizing the P values of the
age, cohort, and sex variables, indicated by color, estimated from the coefficients
of the structural variations model. Effects based on cohort membership (blue),
age (magenta), and sex (yellow) are plotted. Significant P values are as in b.
d, Coefficients of each peptide, reflecting cohort membership in the linear
model assessing structural variation (compare with the blue histograms in
b and c), are plotted against their corresponding P value after applying protein-
wise false-discovery rate correction (Benjamini–Hochberg procedure). Each
point represents a single peptide; blue indicates candidate biomarker peptides
that vary between the healthy and PD cohort groups in the CSF. e, GO term
analysis of the proteins corresponding to the candidate biomarker peptides in d.
The enrichment is computed relative to all proteins included in the data analysis.
f, Histogram corresponding to the analysis of the brain data visualizing the
P values of the cohort variables estimated via t statistics from the coefficients
of three different types of linear model, indicated by color. Color and significant
P values are as in b.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
c d
a
AgeCohort
Sex
–10 0 10
Coefficient
0
1
3
–log10 (protein-wise FDR)
2
Protein-level FDR - CSF
0
200
Count
400
0.25 0.50 0.75
P value
0 1.00
Regulation of presynapse assembly
Pos. reg. of synapse assembly
Neg. reg. of hydrolase activity
Organic anion transport
Cation binding
Vitamin binding
Calmodulin binding
Carboxylic ester hydrolase activity
Excitatory synapse
Integral comp. of postsynaptic density
Presynaptic membrane
Cell membrane microparticle
Neuron projection
Actin cytoskeleton
Glutamatergic synapse
0 0.5 1.0 1.5 2.0
–log10(P value)
Term
f
0.25 0.50 0.75
P value
0 1.00
0
400
Count
600
200
b
Structural variation (LiP)
PK-independent variation
Protein abundance
GO-term analysis: CSF
P-value distribution of the structural
variation model: CSF
P-value distribution of the cohort variable: CSF
Sample properties:
Age (A)
Cohort membership (C)
Sex (S)
MS measurements
Trypsin-only peptide
intensities (PepeTo)
Protein abundances
Technical properties
Total protein
content (TPC)
batch
Protein-wise
P-value adjustment
Structural
variation
Evaluate coeffcients
via t-statistics
LiP peptide
intensities (PepLiP)
Fit linear models to peptide/protein abundances
Protein
abundance
variation
Trypsin-only protein
abundances (ProtT0)
Trypsin-only peptide
intensities (PepeTo)
PepLip = β0
+
Protto = β0 +
Pepto = β0 +
β1 × A + β2 × C + β3 × S +
β4 × A × C +
β5 × A × S +
β6 × C × S +
β7 × Pepto +
β8 × Protto +
β7 × Protto +
β9 × TPC +
β10 × Batch
β8 × TPC +
β9 × Batch
β7 × TPC +
β8 × Batch
β1 × A + β2 × C + β3 × S +
β4 × A × C +
β5 × A × S +
β6 × C × S +
β1 × A + β2 × C + β3 × S +
β4 × A × C +
β5 × A × S +
β6 × C × S +
PK-indepedent
peptide
variation
–5 5
eP-value distribution of the cohort variable: brain
Structural variation (LiP)
PK-independent variation
Protein abundance
0.25 0.50 0.75
P value
0 1.00
0
1,000
Count
1,500
500
2,000
Cellular
component
Molecular
function
Biological
process
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
peptide in PD maps to the center of a homotrimeric complex required
for synapse integrity and plasticity
58
and could thus reflect complex
disassembly and an alteration of synapse integrity upon PD. Nota-
bly, in addition to CBLN1, we identified structural variation in many
other synapse-organizing proteins, for example SLIT and NTRK-like
protein 5 (SLITRK5) and 1 (SLITRK1), neuronal pentraxin-1 (NPTX1),
leucine-rich alpha-2-glycoprotein (LRG1), cell adhesion molecule 1
(CADM1) and 3 (CADM3/NECL-1), neuronal cell adhesion molecule
(NrCAM), and disintegrin and metalloproteinase domain-containing
protein 11 (ADAM11)
5961
. Because PD is a known synaptopathy
62
, the
enrichment for structurally altered synaptic proteins strongly suggests
that our analysis captures PD-relevant changes.
Vitamin D deficiency has been reported in PD, including a rela-
tionship between the severity of motor symptoms and the level of
deficiency
63,64
. To assess whether the structural change we observed for
the vitamin-D-binding protein (GC) (Fig. 4h–j) could reflect substrate
binding, we compared LiP–MS patterns of the purified enzyme with
and without its known binder 1,25 dihydroxy vitamin D (calcitriol),
the active form of vitamin D. The single in situ-measured structural
peptide also changed upon calcitriol addition in vitro (Fig. 4k–l and
Supplementary Table 6), although other calcitriol-dependent peptides
were not well covered in situ. These data are therefore consistent with
an in situ structural change upon vitamin D (calcitriol) binding, but
further validation is needed. We mapped structural changes for sev-
eral other interesting proteins, including the enzymes phosphoserine
aminotransferase (PSAT1) and butyrylcholinesterase (BCH), afamin
(AFM), and serum amyloid P-component (SAMP) (Extended Data Fig. 4).
In summary, we observe structural variation in various classes of
proteins, including enzymes and proteins involved in synapse organiza-
tion and function, that are strongly linked to PD. Structurally altered
proteins in the CSF of individuals with PD are, when detected, often
altered in the PD brain as well, suggesting that some of our candidate
structural biomarkers might be linked to pathological processes. Map-
ping the altered peptides to 3D protein structures suggests hypotheses
for functional changes between the healthy and diseased states that
could be followed up in mechanistic studies.
Cerebrospinal fluid structural peptides can be used to classify
healthy versus Parkinson’s disease groups
We next asked whether combinations of structural peptides could be
used to classify individuals as belonging to the healthy or PD group,
because biomarker combinations can improve sensitivity and specific-
ity
65
. We used 5-fold cross-validation with evenly distributed healthy
and PD samples, generated training and test sets, and defined potential
biomarkers by fitting linear models to the training set, using peptide
combinations. We devised a method that removes the effects of covari-
ates (age, sex, total protein content, batch) on the signal of each peptide
and fed these corrected peptide signals into a regularized regression
algorithm for finding 1, 5, or 10 peptides that together were predictive
of PD status (Fig. 5a). In each run of this analysis, the models were built
using a subset of the data, and their performance was subsequently
evaluated with test samples that were not used for training. We applied
this analysis to LiP peptides (Fig. 5b), to trypsin-only peptides (Fig. 5c),
and to protein abundances (Fig. 5d). Although all models performed
better than expected by chance, the models using combinations of
structural changes outperformed the models using non-structural
changes (Fig. 5b,c; AUC of 0.752, 0.675, and 0.673 for models with 5
optimal predictors selected from either LiP peptides, PK-independent
peptides, or protein abundances). A model based on a single LiP peptide
(AUC 0.684) performed as well as those based on 10 PK-independent
peptide or protein abundance features (AUC 0.674, 0.688). We also
combined structural and abundance changes in a single model while
only marginally increasing discriminatory power (Extended Data
Fig. 5a–c). The superior performance of the structural model is unlikely
to be due to overfitting (Methods), and the relative discriminatory
power of structural peptides versus protein abundance is likely to
be underestimated because our dataset is small and single peptide
measurements are much noisier than protein abundance measure-
ments. Together, these data confirm that protein structural differences
between healthy and PD CSF contain more discriminatory information
than protein abundance information.
Next, we compared the performance of our structure-based
classification to ELISA-based measurements of different species of
CSF α-synuclein, the best currently available potential biomarker66,67.
α-Synuclein peptides were not included in the MS-based analysis
because they were not well detected. Classifications on the basis of
LiP peptides (AUC of 0.752, Fig. 5b; AUC of 0.756, 0.761, and 0.783,
Extended Data Fig. 5a–c) slightly outperformed those on the basis of
total α-synuclein (AUC of 0.733), but not on oligomeric α-synuclein
(AUC of 0.799) or the ratio of oligomeric to total α-synuclein (AUC of
0.838) (Extended Data Fig. 5d).
We investigated whether a combination of α-synuclein and LiP
peptide measures would further enhance the classification. We com-
puted log (odds ratios) using the oligomeric to total α-synuclein and a
model of five LiP peptides (using cross-validation; Fig. 5e). The overall
accuracy (that is, individuals correctly classified as healthy or having
PD) of each model was similar (73%, 75%), and was greatly improved
when both models agreed (91%) (Fig. 5e). Notably, the α-synuclein
model misclassified people with PD (n=12) with low oligomer/total
α-synuclein ratios, which were due to low oligomeric α-synuclein lev-
els (Extended Data Fig. 5e,f); most of these participants (n=10) were
correctly classified by the LiP model. Donors correctly classified as
having PD by LiP, but not by α-synuclein measures, had disease that had
progressed relatively far (Extended Data Fig. 5g); this group included
two donors with unusually long disease duration (16 years and 21 years).
Overall, for the 38 individuals classified as having PD by one of the
two models, about half were true positives (8/15 for the α-synuclein
model, 10/23 for the LiP model). Hence, neither the oligomeric/total
α-synuclein ratio nor the combined structural peptides classified all
individuals with PD correctly. However, the two sets of measurements
harbor complementary information on disease status and, together,
correctly classified almost all participants.
Fig. 4 | Structural changes in selected proteins that are altered between
healthy individuals and those with PD. a, Peptide coverage plot for NrCAM
in CSF (top) and brain (bottom). Black indicates all analyzed peptides; red
indicates significantly altered structural peptides between PD and healthy
samples, after correction for covariates. Half-tryptic peptides were not
visualized. Ig-like (dark gray) and fibronectin type-lll domains (light gray) are
highlighted. b,c, AlphaFold-predicted structure of PITH1, colored according to
peptides in the CSF (b) and brain (c) data. Peptide colors are as in a. d, Peptide
coverage plot for PITH1 in CSF (top) and brain (bottom). Peptide colors as in
a; half-tryptic peptides in gray. e,f, Structure (e) and peptide coverage plot (f)
of homotrimeric CBLN1 (PDB 5KC6), colored as in a, according to peptides in
CSF data; half-tryptic peptides are in gray (f). g, Scaled residuals for healthy
individuals and for those with PD for the indicated peptide of CBLN1. Box
plots: median, center; first and third quantile, lower and upper hinges; largest/
smallest value no further than 1.5 × inter-quantile range of the hinge, whiskers;
data beyond are as outliers and are plotted individually (n=51 participants
in HG and n=49 participants in PDG). h, Structure of vitamin-D-binding
protein GC (PDB 1J78) with bound vitamin D (yellow spheres), colored as in a,
according to peptides from in situ CSF data. i, Peptide coverage plot for GC in
CSF. Colors are as in d. The light blue box indicates the protein region of the
altered structural peptide. j, Scaled residual for the indicated peptide of GC,
as in g (n=51 participants each in HG and PDG). k, Structure of the vitamin-D-
binding protein (as in h), colored according to the peptides that change upon
substrate (calcitriol) addition in vitro. Red, significantly changed peptides
(adjusted P values < 0.05, log2(fold change)<−1 or > 1, P values estimated with
a two-sided t-test); white, detected but unchanged peptides. The blue circle
encloses the peptide overlapping with the in situ-identified altered peptide.
l, Coverage plot for significant peptides from k.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
aNrCAM neuronal cell adhesion molecule Q92823
CSFBrain
300 600
Protein position
900 1,2000
Coefficient
–2.00
2.00
–0.25
0.25
0.50
0
0
Ig-like domains Fibronectin type-lll domains
fge
HG PDG
Cohort
–2
0
4
Scaled residual
2
CBLN1 cerebellin P23435
Candidate pep.
Fully tryptic
Half tryptic
100 120
Protein position
–2
0
Coefficient
2
140 160 180
CBLN1 CSF GIYSFNFHVVK
PITHD1 CSF c db PITHD1 PITH domain-containing protein 1 Q9GZP4PITHD1 Brain
Protein position
50 100 150 200
CSFBrain
Coefficient
0.5
1.0
1.5
2.0
–0.2
0.2
0.4
0
0
j
HG PDG
Cohort
–2
0
Scaled residual
2
GC vitamin D-binding protein P02774
100 200
Protein position
2.5
0
Coefficient
2.5
300 400
YTFELSR
–2
log2(fold change)
2
0
h
k
GC in situ CSF
GC in vitro
i
l
100 200
Protein position
300 400
GC vitamin D-binding protein P02774
Candidate pep.
Fully tryptic
Candidate pep.
Fully tryptic
Half tryptic
Sig. peptides
Candidate pep.
Fully tryptic
Half tryptic
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Overall, our data provide strong support for the hypothesis that
protein structural changes provide more information about the PD
state than protein abundance information. Further, our results suggest
that global structural analyses of the CSF proteome may identify PD
biomarkers complementary to existing α-synuclein-based measures,
which may be combined with these measures to improve power to
classify individuals with the disease.
Discussion
We have demonstrated that global in situ analyses of protein structures
from a body fluid can generate a novel type of molecular readout of a
human disease state. Protein structural alterations in human CSF better
distinguished healthy individuals from those with PD than did protein
abundance changes. We thus show evidence, to our knowledge for
the first time, for the concept that global protein structural analyses
could identify a new type of structural biomarker of a human disease,
with potentially improved performance over classical CSF biomarkers
based on protein levels.
We have identified proteins that are structurally altered between
the CSF of healthy individuals and those with PD, and have substantially
enlarged the set of proteins known to be structurally changed in PD. Per-
formance of our candidate markers alone approached that of the ratio
of oligomeric/total α-synuclein, which is the main known biochemical
player in PD and is currently under investigation as a potential PD bio-
marker. We further showed that our structurally informative peptides
provided complementary information to oligomeric/total α-synuclein
levels and that combining these measures classified healthy individuals
and those with PD with better performance (91% accuracy) than either
measure alone (75% accuracy). In future studies, the signal of our can-
didate markers could be increased by optimizing enzyme-to-substrate
ratios and incubation times in the LiP step. After correction for covari-
ates, we observed only few changes in protein abundance between the
CSF of healthy individuals and those with PD, some of which have been
previously reported15,18,6877.
Translation of this concept to a clinical setting will nevertheless
require additional work. The structural changes require validation
in further independent cohorts and must be tested for specificity
to PD versus other neurodegenerative diseases. Because bottom-up
proteomics is challenging to implement in a clinical setting, validated
markers could in the future form the basis for targeted proteomics or
conformation-specific antibody-based assays to distinguish between
the structural states of proteins in biofluids of people with PD. Although
such developments face many challenges, including identification of
stable structural states against which conformation-specific antibod-
ies can be raised, our work provides a proof of concept for the use of
a global structural analysis as a new type of readout to distinguish
between the healthy and the disease states.
PD is a heterogeneous disease in which individuals either present
only motor symptoms or also show various cognitive problems
12
. A
better understanding of disease manifestation, early diagnosis, and
a
d
c
Protein-level prediction
PK-independent
peptide-level prediction
bLiP peptide-level prediction
0 0.25 0.50 0.75 1.00
FPR
0
0.25
0.50
0.75
1.00
TPR
First
Second
Third
Fourth
Fifth
HG PDG
Test fold
Training fold
0
0.25
0.50
0.75
1.00
TPR
0
0.25
0.50
0.75
1.00
TPR
0 0.25 0.50 0.75 1.00
FPR
0 0.25 0.50 0.75 1.00
FPR
e
HG: 13
PDG: 10
HG: 3
PDG: 24
HG: 7
PDG: 8
HG: 24
PDG: 2
–1
0
1
0 5
log (odds (logit) aSyn ratio)
log (odds (logit) LASSO 5 predictors)
Complementary information by
LiP and aSyn measurements
Cohort
HG
PDG
Five fold
cross-validation
Multilinear
regression
based on training
folds for structural,
PK-independent,
peptide, or protein
variation
Using linear model
based on training data
Residual
training samples
Residual
test samples
Estimated scaled
residuals
for significant
peptides or proteins
Training models
using training
residuals
Evaluate models
using test samples
Classify PD
through LASSO
regression
10 Predictors = 0.769
AUC
1 Predictor = 0.684
5 Predictors = 0.752
AUC
1 Predictor = 0.657
5 Predictors = 0.675
10 Predictors = 0.674
AUC
1 Predictor = 0.574
5 Predictors = 0.673
10 Predictors = 0.688
Fig. 5 | Classification of Parkinson’s disease on the basis of CSF proteome
information. a, Schematic of the analysis workflow for classification of
samples as PD or healthy. b, Receiver operating characteristic (ROC) curves
corresponding to the classification of PD on the basis of LiP peptide variation.
c, ROC curves corresponding to the classification of PD on the basis of the
PK-independent peptide variation. d, ROC curves corresponding to the
classification of PD on the basis of the trypsin-only protein abundance. LASSO
models were built using 1, 5, or 10 predictors, and the average classification
across multiple cross-validations is used for the ROC curves. e, Comparison of
classification using the ratio of oligomeric to total a-synuclein (log (odds) plotted
on x axis) and using a combination of five LiP peptide levels (log (odds) plotted
on y axis). Each point represents an individual, and the cohort membership
is indicated by color. Samples with a log (odds) below zero were classified as
healthy and those with a log (odds) above zero as having PD. The numbers in each
quadrant indicate the results of the classification.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
subtyping of individuals with the disease is essential to direct the choice
of treatment. Our structural peptides performed well at classifying
those with low oligomer/total α-synuclein ratios, which were misclas-
sified by the α-synuclein-based measures, although these individuals
with PD met the diagnostic criteria of clinically probable or established
PD78. Interestingly, it has previously been shown that only about 89%
of individuals with PD classified by the International Parkinson and
Movement Disorder Society clinical diagnostic criteria for PD show
Lewy body pathology at autopsy79. Individuals classified as having PD
by our new structural signature, but not by α-synuclein measures, may
represent those lacking Lewy body pathology, and this will require
more detailed studies. In general, multidimensional peptide-based
structural markers may have the potential to distinguish between
disease subtypes, and therefore to stratify individuals according to
prognosis. On the basis of our data, multidimensional structure-based
markers are likely to be more sensitive than abundance-based ones.
These predictions, however, remain to be tested.
Our data also show that a global in situ structural analysis of dis-
ease sample provides information about pathological mechanisms.
Although our study was designed to identify potential structural bio-
markers in CSF, whereas PD pathology manifests in brain tissue, the set
of structurally altered CSF proteins in PD captures pathways known to
be involved in the disease. Specifically, we identified multiple proteins
involved in synapse organization, known altered enzymes, hormone
and vitamin transport proteins implicated in PD, and proteins related
to amyloid-clearance pathways. The enrichment of synaptic proteins
is particularly intriguing because PD is a known synaptopathy. This
suggests that, even in the CSF, we have identified structurally altered
proteins that may reflect pathology in the PD brain. Indeed, analysis of
postmortem brain samples from a small independent cohort showed
that, for the structurally changed CSF proteins that we could detect in
the brain, more than half showed structural changes in both samples,
and many were strongly linked to the disease. It was expected that the
set of overlapping detected proteins was small and that very few identi-
cal structural changes were observed in the PD brain and CSF, given that
we were comparing samples with differing proteomes, interactomes,
metabolomes, and biophysical properties. Nevertheless, these data are
encouraging and suggest that the application of this approach to brain
samples from a larger cohort of people with PD will provide insight into
structural changes in the disease state.
We also present the first global analysis of a structural proteome in
healthy individuals. Our dataset of variable and invariable CSF proteins
across healthy humans is an important starting point for future struc-
tural biomarker development. It should inform study design to achieve
sufficient statistical power for the identification of new biomarkers,
and it indicates proteins (that is, those with high inter-individual vari-
ability) that are not likely to be useful for this purpose. It should also
serve as a useful resource for structural biology because it identifies
those structurally variable proteins for which in vitro structural data
may be particularly difficult to extrapolate to in vivo state and function.
Finally, our dataset should enable further study of protein structural
changes during human aging.
Although our work has focused on PD, the approach is applicable
to any disease involving dysfunctional proteins with altered structural
states. For instance, constitutively active kinases or non-functional
tumor suppressors in cancer would be expected to show altered struc-
tures and possibly interactomes. Structural-proteomics screens could
capture such changes and prove to be a powerful tool for a more sys-
temic understanding of human disease states.
Online content
Any methods, additional references, Nature Research reporting
summaries, source data, extended data, supplementary informa-
tion, acknowledgements, peer review information; details of
author contributions and competing interests; and statements of
data and code availability are available at https://doi.org/10.1038/
s41594-022-00837-0.
References
1. Kennedy, B. K. et al. Geroscience: linking aging to chronic
disease. Cell 159, 709–713 (2014).
2. Cilento, E. M. et al. Mass spectrometry: a platform for biomarker
discovery and validation for Alzheimer’s and Parkinson’s diseases.
J. Neurochem. 151, 397–416 (2019).
3. Crutchield, C. A., Thomas, S. N., Sokoll, L. J. & Chan, D. W.
Advances in mass spectrometry-based clinical biomarker
discovery. Clin. Proteom. 13, 1 (2016).
4. Jiang, R. et al. Dierential proteomic analysis of serum exosomes
reveals alterations in progression of Parkinson disease. Medicine
98, e17478 (2019).
5. Macklin, A., Khan, S. & Kislinger, T. Recent advances in mass
spectrometry based clinical proteomics: applications to cancer
research. Clin. Proteomics 17, 17 (2020).
6. Thygesen, C., Boll, I., Finsen, B., Modzel, M. & Larsen, M. R.
Characterizing disease-associated changes in post-translational
modiications by mass spectrometry. Expert Rev. Proteom. 15,
245–258 (2018).
7. Tzeng, S. R. & Kalodimos, C. G. Protein activity regulation by
conformational entropy. Nature 488, 236–240 (2012).
8. Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins.
Nature 450, 964–972 (2007).
9. Schopper, S. et al. Measuring protein structural changes on a
proteome-wide scale using limited proteolysis-coupled mass
spectrometry. Nat. Protoc. 12, 2391–2410 (2017).
10. Feng, Y. et al. Global analysis of protein structural changes in
complex proteomes. Nat. Biotechnol. 32, 1036–1044(2014).
11. Cappelletti, V. et al. Dynamic 3D proteomes reveal protein
functional alterations at high resolution in situ. Cell 184,
545–559.e22 (2021).
12. Spillantini, M. G. et al. α-synuclein in Lewy bodies. Nature 388,
839–840 (1997).
13. Braak, H. et al. Staging of brain pathology related to sporadic
Parkinson’s disease. Neurobiol. Aging 24, 197–211 (2003).
14. Brás, I. C., Xylaki, M. & Outeiro, T. F. Mechanisms of
alpha-synuclein toxicity: an update and outlook. Prog. Brain. Res.
252, 91–129 (2020).
15. Maass, F., Schulz, I., Lingor, P., Mollenhauer, B. & Bähr, M.
Cerebrospinal luid biomarker for Parkinson’s disease: an
overview. Mol. Cell. Neurosci. 97, 60–66 (2019).
16. Borrageiro, G., Haylett, W., Seedat, S., Kuivaniemi, H. & Bardien, S.
A review of genome-wide transcriptomics studies in Parkinson’s
disease. Eur. J. Neurosci. 47, 1–16 (2018).
17. Majbour, N. K. et al. Oligomeric and phosphorylated
alpha-synuclein as potential CSF biomarkers for Parkinson’s
disease. Mol. Neurodegener. 11, 7 (2016).
18. Parnetti, L. et al. CSF and blood biomarkers for Parkinson’s
disease. Lancet Neurol. 18, 573–586 (2019).
19. van Dijk, K. D. et al. Changes in endolysosomal enzyme
activities in cerebrospinal luid of patients with Parkinson’s
disease. Mov. Disord. 28, 747–754 (2013).
20. van Steenoven, I. et al. α-Synuclein species as potential
cerebrospinal luid biomarkers for dementia with lewy bodies.
Mov. Disord. 33, 1724–1733 (2018).
21. Van Dijk, K. D. et al. Cerebrospinal luid and plasma clusterin
levels in Parkinson’s disease. Park. Relat. Disord. 19, 1079–1083
(2013).
22. van Dijk, K. D. et al. Reduced α-synuclein levels in cerebrospinal
luid in Parkinson’s disease are unrelated to clinical and
imaging measures of disease severity. Eur. J. Neurol. 21,
388–394 (2014).
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
23. Abdi, I. Y. et al. Preanalytical stability of CSF total and oligomeric
α-synuclein. Front. Aging Neurosci. 13, 85 (2021).
24. ElAgnaf, O. M. A. et al. Detection of oligomeric forms of
αsynuclein protein in human plasma as a potential biomarker
for Parkinson’s disease. FASEB J. 20, 419–425 (2006).
25. Oosterveld, L. P. et al. CSF biomarkers relecting protein
pathology and axonal degeneration are associated with memory,
attentional, and executive functioning in early-stage Parkinsons
disease. Int. J. Mol. Sci. 21, 1–12 (2020).
26. Macron, C., Lane, L., Núnez Galindo, A. & Dayon, L. Deep dive
on the proteome of human cerebrospinal luid: a valuable
data resource for biomarker discovery and missing protein
identiication. J. Proteome Res. 17, 4113–4126 (2018).
27. Barkovits et al. Blood contamination in CSF and its impact on
quantitative analysis of α-synuclein. Cells 9, 370 (2020).
28. Macron, C. et al. Exploration of human cerebrospinal luid: a large
proteome dataset revealed by trapped ion mobility time-of-light
mass spectrometry. Data Brief 31, 105704 (2020).
29. Zhang, J. & Kurgan, L. SCRIBER: accurate and partner
type-speciic prediction of protein-binding residues from proteins
sequences. Bioinformatics 35, i343–i353 (2019).
30. Jones, D. T. & Cozzetto, D. DISOPRED3: precise disordered
region predictions with annotated protein-binding activity.
Bioinformatics 31, 857–863 (2015).
31. Beecham, G. W. et al. PARK10 is a major locus for sporadic
neuropathologically conirmed Parkinson disease. Neurology 84,
972–980 (2015).
32. Nalls, M. A. et al. Identiication of novel risk loci, causal insights,
and heritable risk for Parkinson’s disease: a meta-analysis
of genome-wide association studies. Lancet Neurol. 18,
1091–1102 (2019).
33. Chang, D. et al. A meta-analysis of genome-wide association
studies identiies 17 new Parkinson’s disease risk loci. Nat. Genet.
2017, 1511–1516 (2017).
34. Hoxha, E., Tempia, F., Lippiello, P. & Miniaci, M. C. Modulation,
plasticity and pathophysiology of the parallel iber-purkinje cell
synapse. Front. Synaptic. Neurosci. 8, 35 (2016).
35. Lozovaya, N. et al. GABAergic inhibition in dual-transmission
cholinergic and GABAergic striatal interneurons is abolished in
Parkinson disease. Nat. Commun. 9, 1–14 (2018).
36. Zheng, X. et al. Increase in glutamatergic terminals in the striatum
following dopamine depletion in a rat model of Parkinson’s
disease. Neurochem. Res. 44, 1079–1089 (2019).
37. Gardoni, F., Ghiglieri, V., Luca, M. di & Calabresi, P. Assemblies of
glutamate receptor subunits with post-synaptic density proteins
and their alterations in Parkinson’s disease. Prog. Brain Res. 183,
169–182 (2010).
38. Błaszczyk, J. W. Parkinson’s disease and neurodegeneration:
GABA-collapse hypothesis. Front. Neurosci. 10, 269 (2016).
39. Kayakabe, M. et al. Motor dysfunction in cerebellar Purkinje
cell-speciic vesicular GABA transporter knockout mice. Front.
Cell. Neurosci. 7, 286 (2014).
40. Murueta-Goyena, A., Andikoetxea, A., Gómez-Esteban, J. C. &
Gabilondo, I. Contribution of the GABAergic system to non-motor
manifestations in premotor and early stages of Parkinson’s
disease. Front. Pharmacol. 10, 1294 (2019).
41. Surmeier, D. J. et al. Calcium and Parkinson’s disease. Biochem.
Biophys. Res. Commun. 483, 1013–1019 (2017).
42. Pchitskaya, E., Popugaeva, E. & Bezprozvanny, I. Calcium signaling
and molecular mechanisms underlying neurodegenerative
diseases. Cell Calcium 70, 87–94 (2018).
43. Latourelle, J. C. et al. Genomewide association study for
onset age in Parkinson disease. BMC Med. Genet. 10, 98
(2009).
44. Blauwendraat, C. et al. Parkinson’s disease age at onset
genome-wide association study: deining heritability, genetic loci,
and α-synuclein mechanisms. Mov. Disord. 34, 866–875 (2019).
45. Tan, M. M. X. et al. Genome-wide association studies of cognitive
and motor progression in Parkinson’s disease. Mov. Disord. 36,
424–433 (2021).
46. Wilhelmus, M. M. M. et al. Short communication apolipoprotein E
and LRP1 increase early in Parkinson’s disease pathogenesis. Am.
J. Pathol. 179, 2152–2156 (2011).
47. Troy T, R. & Jacob M, M. Apolipoprotein E fragmentation within
lewy bodies of the human Parkinson’s disease brain. Int. J.
Neurodegener. Disord. 1, 002 (2018).
48. Xu, J., Mashimo, T. & Südhof, T. C. Synaptotagmin-1, -2, and -9:
Ca2+ sensors for fast release that specify distinct presynaptic
properties in subsets of neurons. Neuron 54, 567–581 (2007).
49. Delignat-Lavaud, B. et al. The calcium sensor synaptotagmin-1 is
critical for phasic axonal dopamine release in the striatum and
mesencephalon, but is dispensable for basic motor behaviors in
mice. Preprint at bioRxiv https://doi.org/10.1101/2021.09.15.460511
(2021).
50. Wu, M., Puddifoot, C. A., Taylor, P. & Joiner, W. J. Mechanisms
of inhibition and potentiation of α4β2 nicotinic acetylcholine
receptors by members of the Ly6 protein family. J. Biol. Chem.
290, 24509 (2015).
51. Wang, Q. et al. The landscape of multiscale transcriptomic
networks and key regulators in Parkinson’s disease. Nat. Commun.
2019, 1–15 (2019). 101 10.
52. Power, J. H. T., Shannon, J. M., Blumbergs, P. C. & Gai, W. P.
Nonselenium glutathione peroxidase in human brain: elevated
levels in Parkinson’s disease and dementia with Lewy bodies. Am.
J. Pathol. 161, 885–894 (2002).
53. Corradini, B. R. et al. Complex network-driven view of genomic
mechanisms underlying Parkinson’s disease: Analyses in dorsal
motor vagal nucleus, locus coeruleus, and substantia nigra.
Biomed. Res. Int. 543673 (2014).
54. Lachén-Montes, M. et al. Unveiling the olfactory proteostatic
disarrangement in Parkinson’s disease by proteome-wide
proiling. Neurobiol. Aging 73, 123–134 (2019).
55. Lachén-Montes, M. et al. Smelling the dark proteome: functional
characterization of PITH domain-containing protein 1 (C1orf128) in
olfactory metabolism. J. Proteome Res. 19, 4826–4843 (2020).
56. Ma, S. et al. Peroxiredoxin 6 is a crucial factor in the initial step
of mitochondrial clearance and is upstream of the PINK1–parkin
pathway. Antioxid. Redox Signal. 24, 486–501 (2016).
57. Yun, H. M., Choi, D. Y., Oh, K. W. & Hong, J. T. PRDX6 exacerbates
dopaminergic neurodegeneration in a MPTP mouse model of
Parkinson’s disease. Mol. Neurobiol. 52, 422–431 (2015).
58. Elegheert, J. et al. Structural basis for integration of GluD
receptors within synaptic organizer complexes. Science 353,
295–300 (2016).
59. Chipman, P. & Goda, Y in Dendrites: Development and Disease
425–465 (Springer Japan, 2016).
60. Won, S. Y., Lee, P. & Kim, H. M. Synaptic organizer: Slitrks and type
IIa receptor protein tyrosine phosphatasess. Curr. Opin. Struct.
Biol. 54, 95–103 (2019).
61. Lee, S. J. et al. Presynaptic neuronal pentraxin receptor organizes
excitatory and inhibitory synapses. J. Neurosci. 37, 1062–1080
(2017).
62. Longhena, F., Faustini, G., Spillantini, M. G. & Bellucci, A. Living in
promiscuity: the multiple partners of α-synuclein at the synapse
in physiology and pathology. Int. J. Mol. Sci. 20, 141 (2019).
63. Fullard, M. E. & Duda, J. E. A review of the relationship between
vitamin D and Parkinson disease symptoms. Front. Neurol. 11, 454
(2020).
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
64. Lawton, M. et al. Blood biomarkers with Parkinson’s disease
clusters and prognosis: the oxford discovery cohort. Mov. Disord.
35, 279–287 (2020).
65. Li, T. & Le, W. Biomarkers for Parkinson’s disease: how good are
they? Neurosci. Bull. 36, 183–194 (2020).
66. Kang, U. J. et al. Comparative study of cerebrospinal luid
α-synuclein seeding aggregation assays for diagnosis of
Parkinson’s disease. Mov. Disord. 34, 536–544 (2019).
67. Rossi, M. et al. Ultrasensitive RT-QuIC assay with high sensitivity
and speciicity for Lewy body-associated synucleinopathies. Acta
Neuropathol. 140, 49–62 (2020).
68. Rotunno, M. S. et al. Cerebrospinal luid proteomics implicates the
granin family in Parkinson’s disease. Sci. Rep. 2020, 1–11 (2020).
69. Eusebi, P. et al. Cerebrospinal luid biomarkers for the diagnosis
and prognosis of Parkinson’s disease: protocol for a systematic
review and individual participant data meta-analysis. BMJ Open 7,
e018177 (2017).
70. Simrén, J., Ashton, N. J., Blennow, K. & Zetterberg, H. An update on
luid biomarkers for neurodegenerative diseases: recent success
and challenges ahead. Curr. Opin. Neurobiol. 61, 29–39 (2020).
71. Dixit, A., Mehta, R. & Singh, A. K. Proteomics in human Parkinson’s
disease: present scenario and future directions. Cell. Mol.
Neurobiol. 39, 901–915 (2019).
72. Parnetti, L. et al. Parkinson’s and Lewy body dementia CSF
biomarkers. Clin. Chim. Acta 495, 318–325 (2019).
73. Heywood, W. E. et al. Identiication of novel CSF biomarkers for
neurodegeneration and their validation by a high-throughput
multiplexed targeted proteomic assay. Mol. Neurodegener. 10,
64 (2015).
74. Magdalinou, N. K. et al. Identiication of candidate cerebrospinal
luid biomarkers in parkinsonism using quantitative proteomics.
Parkinsonism Relat. Disord. 37, 65–71 (2017).
75. Magdalinou, N., Lees, A. J. & Zetterberg, H. Cerebrospinal
luid biomarkers in parkinsonian conditions: an update and
future directions. J. Neurol. Neurosurg. Psychiatry 85,
1065–1075 (2014).
76. Sarkar, A., Rawat, N., Sachan, N. & Singh, M. P. Unequivocal
biomarker for Parkinson’s disease: a hunt that remains a pester.
Neurotox. Res. 36, 627–644 (2019).
77. Liu, W. et al. Role of exosomes in central nervous system diseases.
Front. Mol. Neurosci. 12, 240 (2019).
78. Postuma, R. B. et al. MDS clinical diagnostic criteria for Parkinson’s
disease. Mov. Disord. 30, 1591–1601 (2015).
79. Geut, H. et al. Neuropathological correlates of parkinsonian
disorders in a large Dutch autopsy series. Acta Neuropathol.
Commun. 8, 39 (2020).
Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional ailiations.
Springer Nature or its licensor holds exclusive rights to this
article under a publishing agreement with the author(s) or other
rightsholder(s); author self-archiving of the accepted manuscript
version of this article is solely governed by the terms of such
publishing agreement and applicable law.
© The Author(s), under exclusive licence to Springer Nature America,
Inc. 2022
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Methods
Study cohorts
Cerebrospinal luid study cohort. The study population consisted
of 52 individuals with PD (Age: 41–84, Male: 39.2%) that attended the
outpatient clinic for movement disorder of the VU University Medical
Center Amsterdam, the Netherlands, and CSF was collected in the period
of September 2008 to February 2012, as described previously22. Individu-
als with known monogenic forms of PD were excluded. Those with PD
fulfilled the United Kingdom Parkinson’s Disease Society Brain Bank clini-
cal diagnostic criteria
80
. Severity of parkinsonism and disease state were
rated using the Unified Parkinson’s Disease Rating Scale (UPDRS) motor
subscale and the modified Hoehn and Yahr (H&Y) classification
81,82
. The
participants underwent a mini-mental-state examination to exclude
dementia. The disease duration was calculated from the first motor
symptoms as reported by the participants to the time of diagnosis. An
age- and sex-matched healthy control cohort group consisted of 51 indi-
viduals (age: 51–82, male: 65.4%). For the healthy cohort group, dementia
was excluded using the Cambridge Cognitive Examination scale (CAM-
COG)83. All participants underwent an extensive, standardized clinical
assessment, including medical history and a neurological examination.
The levels of previously identified PD-associated biochemical mark
-
ers, such as t-α-Syn, p-α-Syn, and oligomeric α-Syn, were measured by
ELISA using monoclonal epitope-specific antibodies or monoclonal
conformation-specific antibodies, as previously described
17,19,25
. The
study was approved by the local ethics committee of the VU University
Medical Center, and all participants gave written informed consent.
Participants were reimbursed for travel costs and offered a lunch. No
further compensation was offered. The number of samples per cohort
group was restricted by the availability of suitable biofluid samples. No
statistical methods were used to pre-determine sample size, but our
sample sizes are similar to those reported in previous publications68.
Brain study cohort. The postmortem brain tissue samples were col-
lected by the Netherlands Brain Bank (www.brainbank.nl) in Amster-
dam, the Netherlands, and further characterized by Wilma van de Berg79.
For all donors, a written informed consent for brain autopsy and the
use of the material and clinical information for research purposes had
been obtained from the donor or the next of kin. Demographic features
and clinical symptoms were extracted from the clinical files, including
sex, age at symptom onset, age at death, disease duration, presence
of dementia, and core and supportive clinical features for PD accord-
ing to the International Parkinson and Movement Disorder Society
clinical diagnostic criteria for PD
78
. Braak and McKeith αSyn stages
were determined using the BrainNet Europe criteria
84
. On the basis
of Thal amyloid-β phases scored on the medial temporal lobe85, Braak
neurofibrillary stages
86
and CERAD neuritic plaque scores
87
, levels of AD
pathology were determined according to National Institute on Aging
and Alzheimer’s Association consensus criteria
88
. Additionally, Thal
CAA stages89, presence of aging-related tau astrogliopathy (ARTAG)90,
microvascular lesions and hippocampal sclerosis were assessed. We
selected 5 brain tissue samples of the medial temporal gyrus from female
individuals with sporadic PD and 5 non-neurological controls aged
81–84 years. The study was approved by the local ethics committee of
the VU University Medical Center, Amsterdam, the Netherlands. Donors
and their relatives did not receive any compensation. By signing the
informed consent form, donors gave permission for postmortem brain
autopsy and use of their brain material and medical records for research
purposes. Donors and their relatives did not receive any compensation.
No statistical methods were used to pre-determine sample sizes.
Statistics and reproducibility
We conducted LiP–MS on CSF samples from a well-characterized clini-
cal cohort consisting of 52 people with PD and 51 healthy age-matched
individuals, with both sexes represented in both cohort groups. Statisti-
cal analyses to correct for covariates as well as for protein abundance
and non-structural effects have been described in detail below. Our
analysis was expected to provide unprecedented proteome-wide
information on structural changes in PD, such that a prior estimate
of effect size was not possible. Therefore, no statistical method was
used to determine sample size, but our sample sizes are similar to
those reported in other publications68. All LiP–MS experiments were
conducted in triplicate and assessed with standard quality-control
measures in the Picotti laboratory, and mass spectrometry runs were
randomized between cohorts to minimize batch effects. Apart from
data from a single individual in the healthy group who lacked protein
abundance measurements and was therefore excluded from the com-
parison between healthy and PD cohort groups, no data were excluded.
Data collection and analysis were not performed blind.
Cerebrospinal fluid collection
CSF was collected from individuals by lumbar puncture and collected
in polypropylene collection tubes. The CSF was analyzed for red blood
cell contamination (see Extended Data Fig. 1a), centrifuged at 1,800g at
4°C for 10minutes, aliquoted, and stored at −80°C within 2hours of
collection until processing, in line with published guidelines91.
Limited proteolysis of cerebrospinal fluid samples
The CSF samples were thawed in batches of seven on ice. Each batch was
designed to have a similar sample distribution with respect to age, sex,
and cohort group. For each sample, the protein concentration was deter-
mined using the Pierce BCA Protein Assay Kit (Thermo Fisher Scientific).
Each sample was split into a trypsin-only and a LiP sample, containing 80
µl of CSF. Both samples were heated to 25°C for 5minutes. Proteinase K
from Tritirachium album (Sigma Aldrich) was added to the LiP sample
at an enzyme/substrate (E:S) ratio of 1:100 (wt/wt) and incubated for
5minutes at 25°C. The trypsin-only samples were treated with the corre-
sponding amount of water under the same conditions. The digestion by
Proteinase K was stopped by heating all samples to 95°C for 5minutes,
samples were cooled down to 4°C, and sodium deoxycholate (Sigma
Aldrich) was added to a final concentration of 5%. Cysteine residues
were reduced by adding TCEP (Pierce) to a final concentration of 5mM
and incubated for 30minutes at 37°C with shaking at 600r.p.m., using
an Eppendorf ThermoMixer C. The reduced cysteines were alkylated by
the addition of IAA (Sigma Aldrich) to a final concentration of 4mM and
were incubated at room temperature for 30minutes. Samples were then
diluted with 0.1M ammonium bicarbonate (AmBic) (Sigma Aldrich) to
2.5% sodium deoxycholate (DOC) before the first digestion step with
LysC (Wako Chemical), at a 1:100 enzyme to substrate ratio (wt/wt) for
120minutes at 37°C at 600r.p.m. After a further dilution step of DOC
to 1% by 0.1 AmBic, samples were digested with trypsin (Promega),
added at a 1:100 enzyme to substrate ratio (wt/wt) for 16hours at 37°C
at 600r.p.m. The digestion was stopped by the addition of formic acid
(Sigma Aldrich) to a final concentration of 2% to result in a pH below 3.
The DOC precipitate was filtered out using a 0.2-µm PVDF hydrophilic
membrane (Corning). The filtered peptides mixture was loaded on
96-well MACROSpin plates (The Nest Group), desalted, and eluted
with 80% acetonitrile and 0.1% formic acid. The peptides were dried
using vacuum centrifugation and resuspended in 0.1% formic acid. The
peptide concentration was determined using the Pierce™ Quantitative
Colorimetric Peptide Assay (Thermo Fisher Scientific).
Limited proteolysis of brain tissue samples
The brain tissue samples were thawed on ice and transferred to a 1.5-ml
tube using 300–400 µl ice cold lysis buffer (100mM HEPES pH 7.4,
150mM KCl, 1mM MgCl
2
; one cOmplete protease inhibitor cocktail
tablet (Roche) was added per 50ml lysis buffer), depending on the start-
ing weight of the sample. The samples were not randomized because
all brain tissue samples were processed in one batch. The cells were
lysed using a pellet mixer. Samples were homogenized for 8 cycles of
15 strokes each, with a 1-minute break on ice between the cycles. The
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
lysate was then centrifuged at 9,390g for 10minutes at 4°C. The super-
natant was transferred to a new tube and the protein concentration
was determined using the Pierce BCA Protein Assay Kit (Thermo Fisher
Scientific). For each brain tissue sample, a trypsin-only and LiP sample,
in triplicate, was prepared using 40 µg of lysed material. Each sample
was topped up to 50 µl using lysis buffer. The LiP and trypsin-digestion
steps were performed as for the CSF samples, described above.
Liquid chromatography–mass spectrometry methods for
cerebrospinal fluid samples
All chemicals were purchased from Sigma unless otherwise mentioned.
For data collection of all raw MS files, Xcalibur (4.1) was used.
Liquid chromatography–mass spectrometry analysis for library
generation (data-dependent acquisition). For library generation, two
condition pools (HG and PDG) of the LiP-treated and the trypsin-only
samples were prepared (in total 4 pools, ~200µg each pool). Each pooled
sample was fractionated by high pH reverse phase (HPRP) fractiona-
tion using a Dionex Ultimate 3000 LC (Thermo Fisher Scientific) on
an ACQUITY UPLC CSH 1.7-µm C18 column (2.1 × 150mm, Waters). The
peptides were separated by a non-linear gradient from 1% HPRP buffer
B (100% acetonitrile)/99% HPRP buffer A (20mM ammonium formiate,
pH 10) to 40% buffer B. A fraction was taken every 45seconds and frac-
tions were pooled into 15 final fractions. Afterward, peptides were dried
completely in a speed vac and resuspended in solvent A (1% acetonitrile,
0.1% formic acid) and spiked with iRT peptides, according to the manu-
facturer’s recommendations (Biognosys). Peptide concentrations were
determined using nano-drop (Spectrostar Nano, BMG labtech).
Fractionated samples (2µg) were separated by a non-linear gradi-
ent from 1% buffer B (85% acetonitrile, 0.1% formic acid in water)/95%
buffer A (1% acetonitrile, 0.1% formic acid in water) to 44% buffer B in
2hours on an in-house-packed 60-cm column (PicoFrit emitters, inner
diameter 75 µm, New Objective; CSH 1.7-µm column material, Waters)
using an Easy nLC 1200 (Thermo Fisher Scientific) coupled online to
a Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific). The
flow rate was set to 250nl/min. The Q Exactive HF-X mass spectrometer
was operated in data-dependent acquisition Top15 mode with follow-
ing settings: MS1 scan range: 350–1,650 Th; resolution 60,000; MS1
AGC target: 3 × 10
6
; MS1 maximum injection time (IT): 25ms; MS2 scan
resolution: 15,000; MS2 AGC target: 2 × 105; MS2 maximum IT: 25ms;
isolation window: 4 Th; scan range: 200 to 2000 Th; NCE: 27; minimum
AGC target: 1 × 10
3
; only charge states 2 to 5 considered; peptide match:
preferred; exclude isotopes: on; dynamic exclusion: 20seconds.
Liquid chromatography–mass spectrometry analysis (data-
independent acquisition). The DIA acquisition method was adapted
from Bruder, R. et al.92. Peptides (2µg) were separated by a non-linear
gradient using the same LC–MS setup and gradient as described before.
The mass spectrometer was operated in DIA mode using the following
settings for the MS1 scan: range: 350 to 1650 Th, resolution: 120,000;
AGC target: 5 × 106; maximum IT: 20ms.The MS1 scan was followed by
45 DIA scans using the following settings: resolution: 30,000; AGC tar-
get: 3 × 106; maximum IT: 55ms; fixed first mass: 200 Th; stepped NCE:
25.5, 27, 30. The window widths were adjusted to the precursor density.
Data analysis: hybrid library generation. The DDA and DIA raw files
were searched separately with SpectroMine 2.0.190613.43665 (Biogno-
sys) against the human UniProt FASTA including isoforms (downloaded
on 1 July 2019). For the trypsin-control samples the following search set-
tings were applied: acetyl (protein N-term) and oxidation (M); enzyme:
trypsin/P with up to two missed cleavages. For the LiP-treated samples,
the digestion specificity was changed to semi-tryptic. Mass tolerances
were automatically determined by SpectroMine, and other settings
were set to default. Search results were filtered by a 1% false-discovery
rate (FDR) on precursor, peptide, and protein level93,94. For hybrid
library generation
95
, the search archives from the above-mentioned
fractionated samples were used for library generation in SpectroMine.
Default settings were applied during library generation (1% FDR).
Data-independent acquisition data analysis. Prior to analysis of the
DIA data, the raw files were converted into htrms files using the htrms
converter (Biognosys). MS1 and MS2 data were centroided during
conversion. In Spectronaut, imputing was not performed. The other
parameters were set to default. The htrms files were analyzed with
Spectronaut 13 (version: 13.5.190812, Biognosys)
96
using the previ-
ously generated hybrid library and default settings. The results were
filtered by a 1% FDR on precursor and protein level (Q value < 0.01)
(Supplementary Tables 1 and 2).
Liquid chromatography–mass spectrometry methods for
brain samples
Liquid chromatography–mass spectrometry analysis for library
generation (data-dependent acquisition). The library for the brain
tissue samples was prepared by pooling all replicates from two or
three individuals into a new sample. In total, four trypsin-only and
LiP-treated pooled samples were generated. Each sample (0.75 µg)
was separated by a linear gradient from 3% buffer B (B: 100% acetoni-
trile, 0.1% formic acid; A: 0.1% formic acid in water) to 35% buffer B in
2hours on an in-house-packed 40cm column (PicoFrit emitters, inner
diameter 75 µm, New Objective; Reprosil-Pur 120 C18-AQ, 1.9µm,
Dr. Maisch) using an ACQUITY UPLC M-Class (Waters) coupled online
to a Orbitrap Fusion Lumos Tribrid (Thermo Fisher Scientific). The flow
rate was set to 300nl/minute. The mass spectrometer was operated
in data-dependent acquisition mode with the following settings: MS1
scan range: 350–1,400m/z; resolution: 120,000; MS1 AGC target: 200%;
MS1 maximum injection time: 100ms; MS2 scan resolution: 30,000;
MS2 AGC target: 200%; MS2 maximum injection time: 54ms; isolation
window: 1.6m/z; scan range: normal; dynamic exclusion: 60seconds.
Liquid chromatography–mass spectrometry analysis analysis
(data-independent acquisition). The peptides (0.75 µg) were sepa-
rated by a linear gradient using the same LC–MS setup and gradient
as described before. Prior to injection, iRT peptides were spiked-in,
as for the CSF samples. The mass spectrometer was operated in DIA
mode using the following setting for MS1 scan: range: 350–1400m/z;
resolution: 120,000, maximum injection time: 100ms. The MS1 scan
was followed by 41 DIA scans using the following settings: resolution:
30,000m/z, AGC target: 200%; maximum injection time: 54ms. The
window widths were adjusted to the precursor density.
Data analysis: hybrid library generation and data-independent
acquisition data analysis. Preparation of the hybrid library and data
analysis were performed similarly as for the CSF data. The DDA and
DIA files of all experiments were searched together with SpectroMine
(2.8.210609.47784, Biognosys) with the same UniProt FASTA file as
used for the CSF. The digestion specificity was set to semi-tryptic,
and all other settings were set to default. All DIA files were analyzed in
Spectronaut 15 (15.1.210713.50606, Biognosys), using the generated
hybrid library, and default settings were applied.
Preprocessing of peptide intensities and protein abundances
Filtering and normalization of peptide intensities. Peptide filtering
and normalization steps were applied to the trypsin-only treated data
and to the proteinase K-treated (LiP) data individually before further
processing steps were performed.
Peptides that were detected in less than 50% of the samples and/
or fewer than 10 samples in each cohort group were excluded from any
further analysis. Additionally, all peptides belonging to albumin were
removed. The numbers of peptides and proteins at various processing
and analysis steps are shown in Supplementary Table 3.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
After log-transforming the data, signals of remaining peptides
were normalized by applying sample-specific correction factors. These
correction factors were estimated on the basis of the observed intensi-
ties of the peptides that were detected across samples. For this, the
mean intensity (across all samples) of every peptide that fell in the top
and bottom 15% of peptide intensities was excluded from the further
estimation of the correction factors. The remaining peptide intensities
were centered over all samples. For retrieving the centered intensity
(cPI) of each peptide
i∈(1,…,N)
in sample
k∈(1,…,K)
, the mean
peptide intensity (PI) of peptide i was subtracted from the peptide
intensity of peptide i in sample k:
cPIik PIik 1
K
K
k󰁞1
PIik
Because the PI
ik
values are log-transformed intensities, this cor-
responds to dividing the intensities by their geometric mean. For esti-
mating the correction factor for sample
k∈(1,…,K)
, a trimmed mean
intensity (tPI) was computed from the centered peptide intensities,
excluding the top 10% of peptides with the highest intensities and the
bottom 10% of peptides with the lowest intensities in each sample.
Last, these sample-specific correction factors were subtracted
from the measured intensities sample-wise to retrieve the normalized
peptide intensities (nPI) which were used in the following analysis:
nPIik =PIik tPIk
Note that this correction was performed on all peptides, not only
those that were used to compute the correction factor.
Summarizing triplicates in brain data. After filtering peptides and
normalizing the peptide signals of the individual samples in the brain
data, the triplicates were summarized in both the LiP and trypsin-only
peptide data. This step was not necessary for the CSF data, for which
we did not have sufficient material for replicates. A mean peptide
intensity was estimated if a peptide was detected in at least two of the
triplicates. If a mean peptide intensity could be estimated in all ten
donors, the peptide remained in the analysis. All following analyses
were performed with these summarized replicate intensities.
Estimating protein abundances from trypsin-only peptide levels.
Protein abundances were estimated from the normalized peptide
intensities of the trypsin-only data using mapDIA97. Half-tryptic pep-
tides and peptides mapping to multiple proteins were excluded from
this preprocessing step. If a protein had multiple isoforms, only one
protein abundance was estimated. Protein abundances were then
estimated using the following settings in mapDIA and subsequently
log-transformed:
Assessing structural variability in cerebrospinal fluid of
healthy individuals
Identifying peptides with increased structural variability between
samples. In order to identify protein regions that show relatively
large inter-individual structural variation, we focused on peptides
with large variation in the LiP data, but relatively small variation in the
trypsin-only data. Therefore, we are identifying peptides that vary
specifically after proteinase K digestion, but not due to other factors,
such as technical variability or variation in protein abundance.
In a first step, we removed outlier measurements using Cook’s
distance. We fitted the peptide intensities of each of the LiP and the
trypsin-only data to the corresponding trypsin-only protein abun-
dances. Subsequently, we estimated the influence that each peptide
measurement has on the fitted response value using Cook’s distance.
A common threshold, equal to four divided by the number of observa-
tions, was applied, and all peptide measurements with a Cook’s D above
this value were removed from further analysis.
Subsequently, the s.d. of each peptide was estimated within the
LiP and within the trypsin-only samples. A variability score (VarS) was
computed by subtracting the s.d. of the peptide in the trypsin-only
data from its s.d. in the LiP data. P values corresponding to the differ-
ence in s.d. values were estimated using the robust Brown–Forsythe
Levene-type procedure from the R package lawstat (version 3.4)
98
.
Peptides with a score above 1 and a P value below 1 × 10−5 were defined
as highly variable peptides. Scores between 0.5 and 1 with a P value
below 1 × 10−5 were defined as medium-variable peptides. If the VarS
was below 0.5 or the P value was above 1 × 10
−5
, peptides were defined as
non-variable. We observed that some peptides had a bimodal distribu-
tion of their signals in the LiP data, indicating a ‘switch-like’ structural
change of the protein, that is, a protein populating two states. In order
to systematically identify such peptides, we used the function modetest
from the R package multimode (version 1.5) on the normalized LiP
intensities99 (Supplementary Tables 4 and 5).
Disorder prediction. Intrinsically disordered regions were predicted
from full-length protein sequences using DISOPRED version 3.1 (ref. 30),
relying on BLAST version 2.2.26 and the uniref90 database. The mean
disorder score was computed over every peptide. Out of 9,888 peptides
that were included in the analysis of variable peptides, mean disorder
scores were successfully computed for 9,235 peptides. Statistical sig-
nificance for differences in the degree of disorder for highly-variable,
medium-variable, and non-variable peptides was assessed using
Fisher’s exact test; peptides with a mean disorder score >0.5 were
considered disordered, whereas peptides with a mean disorder score
<0.5 were considered ordered.
Secondary structure prediction. Secondary structures were pre-
dicted from protein sequences using Psipred v3.5 (ref. 100). The sec-
ondary structure type (helix, beta strand, or loop) with the highest
likelihood was assigned to every residue of the protein sequence. On the
basis of the assigned secondary structure for every residue, the content
of helical, beta strand, or loop secondary structure was calculated for
all 9,888 peptides included in the analysis of inter-individual variabil-
ity using a custom-made python script, facilitated by the rstoolbox
library
101
. Statistical significance for differences in how often different
secondary structures are present in highly-variable, medium-variable,
and non-variable peptides was assessed using the Wilcoxon rank-sum
test. All Wilcoxon rank-sum tests in this paper were performed as
two-tailed tests.
Other structural descriptors. Sequence-level structural and func-
tional annotations were obtained from DescribePROT (accessed
October 2020), a recently released database of structural and func
-
tional annotations of proteins102. DescribePROT contains a collec-
tion of pre-computed scores from previously developed predictors,
Parameter Input CSF data Input brain data
LEVEL 2 2
LOG2_TRANSFORMATION false false
EXPERIMENTAL_DESIGN IndependentDesign IndependentDesign
SDF 2 2
MIN_CORREL 0.3 0.3
MIN_OBS 5 5 5 5
MIN_PEP_PER_PROT 2 2
LABELS HG PDG HG PDG
SIZE 53 52 5 5
MIN_DE .01 .01
MAX_DE .99 .99
CONTRAST −0
1 - −0
1 -
MAX_PEP_PER_PROT inf inf
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
such as SCRIBER (protein-protein interaction propensity)29, VSL2B
(disorder)103, ASAquick (solvent-accessible surface area)104, and
DRNApred to predict DNA- and RNA-binding propensity
105
. Scores
were obtained for 9,754 out of 9,888 peptides included in the analy-
sis of inter-individual variability. Statistical significance for differ-
ences in sequence-level structural and functional annotations of
highly-variable, medium-variable, and non-variable peptides was
assessed using the Wilcoxon rank-sum test.
STRING analysis. Physical and functional protein-protein interac-
tions were obtained from the STRING database, version 11.0 (ref.
106
)
(accessed in December 2020). To capture both physical and func-
tional interactions, we focused on the STRING combined score that is a
weighted combination of probabilities in different evidence channels.
Because the STRING analysis is carried out on proteins, rather than
peptides, it was necessary to classify proteins as variable or invariable.
Proteins that contained at least one highly variable peptide were clas-
sified as variable (64 out of 994 proteins in total), whereas all other
proteins were classified as invariable (930 proteins). High-confidence
interactions, defined by a cut-off of STRING combined score > 0.9, were
counted for each protein. Statistical significance between the number
of STRING interactors per protein defined as variable and invariable
was assessed using the Wilcoxon rank-sum test.
Domain analysis. Protein domains, as annotated in the PFAM database,
were extracted from UniProt (December 2020). Unique domains were
counted for each protein, and the number of domains compared between
variable and invariable proteins, defined as described above. The statisti-
cal significance of the different number of domains in variable and invari-
able proteins was assessed using the Wilcoxon rank-sum test.
Structural igures. Crystal structures of selected proteins were
obtained from the Protein Data Bank
107
, accessed in January 2021. Only
structures that contained the peptide sequences of interest (that is,
candidate biomarker peptides or variable peptides) were considered.
If multiple PDB structures were found, structures of the wild-type
sequence with the highest resolution were preferred. The peptide
sequences were aligned to the PDB sequence and colored according
to their significance using a custom-made python script. Similarly,
UniProt annotations of functional sites and binding sites were mapped
to the PDB structure, and their Euclidean distance to the peptides of
interest was computed. All structural images were generated in PyMOL
version 2.4.1.
Inferring structural variation for Parkinson’s disease in
cerebrospinal fluid data
Estimating effect sizes of covariates on peptide intensities and
protein abundances. Cohort effects and effects of other covariates
on the intensities of LiP peptides were estimated using linear models.
The aim was to identify peptides that were significantly differently
digested by proteinase K between the two cohort groups (PDG versus
HG) and therefore indicate structural rearrangement (accessibility
change) of the corresponding protein. An increase in accessibility will
lead to increased production of half-tryptic peptides resulting from
proteinase K digestion, while the relative abundance of the respective
fully tryptic peptides from the same protein region declines. Here, we
used only changes in the abundance of fully tryptic peptides as indica-
tors of accessibility changes because the fully tryptic peptide signal
has lower noise than that of half-tryptic peptides.
Normalized LiP peptide intensities (PepLiP) were modeled as a func-
tion of various technical and biological factors, specifically the level of
the host protein, the intensity of the peptide in the trypsin-only data,
the total protein content (TPC) of the sample, batch (Batch), age (A),
sex (S), and disease status (C, healthy or PD). In addition, we modeled
interactions between some of these factors, as detailed below. We
observed that intensities of LiP peptides were correlated with the total
protein abundance estimated from the trypsin-only intensities (Prot
to
)
of each sample, and thus included this confounder in the linear models.
By including peptide intensities from the trypsin-only treated samples
(Pep
to
) in the model, we accounted for potential peptide-specific inten-
sity changes that were not due to variable proteinase K accessibility,
such as post-translational modifications, protein isoforms, or endog-
enous cleavage. Such changes would equally affect peptide intensities
in the trypsin-only and LiP samples and would thus be PK-independent.
Because we lack the TPC measurement of one of the healthy samples,
this sample was removed from all of the following analyses, resulting
in 51 healthy and 51 PD samples being included.
The resulting full model is:
Pep
LiP
=β
0
+β
1
A+β
2
C+β
3
S+β
4
AC+β
5
AS+
β6CS+β7Pepto +β8Protto +β9TPC +β10 Batch
The age (A), cohort group membership (healthy versus PD; C), and
sex (S), as well as their two-way interactions, were modeled, β1 - β10 are
the regression coefficents. Cohort and sex are categorical variables
that need to be modeled as factors using so-called ‘dummy variables.
Different encodings of categorical variables are possible, and the
encoding choice affects the interpretability of the resulting coefficients
(contrasts). In order to aid the interpretation for this study, we have
chosen the following encodings: cohort group was modeled using a
dummy coding defining the cohort variable C for ‘healthy’ as 0 and for
‘PD’ as 1. Thus, coefficients of the cohort term reflect the deviation of
the peptide signal in the diseased cohort group relative to healthy as
a reference. Sex, however, was encoded using deviation coding, setting
the variable S for ‘male’ to −1 and for ‘female’ to +1. Batch was modeled
as a factor using dummy coding, with the first measured batch being
used as the reference group. Note that the choice of the encoding also
affects the interaction terms. We never tested for the statistical signifi-
cance of the interaction terms, because we deemed the cohort size too
small to enable detection of interactions. However, we included inter-
action terms in the model as a conservative approach to avoid overes-
timating marginal effects. After retrieving all linear models, we exclude
those which were not able to model at least 80 of the 103 samples (owing
to missing data) from any further analysis. The statistical significance
of coefficients (contrasts) was determined using t statistics (two-tailed
test of hypothesis). The t value
t
β
was calculated by dividing the coef-
ficient
β
of a variable i by its standard error
S
β
:
t
β=
β
S
β
The number of peptides detected per protein strongly varies
depending on the protein and its abundance. Thus, proteins with
many peptides have a higher probability that one of their peptides
shows a significant effect by chance compared with proteins with
few peptides. Hence, we corrected for this multiple testing effect
by computing corrected P values using the Benjamini–Hochberg
approach over all peptides from the same protein (‘protein-wise’).
Peptides with a corrected P value ≤ 0.05 were defined as candidate
biomarker peptides and used for subsequent steps (Supplementary
Tables 4 and 7).
Investigation of the P values over all peptides corresponding to
age, cohort, and sex showed a depletion of small P values for the sex
(Fig. 3b). To test whether the modeling of sex and interactions with
sex was biasing P values of the cohort variable, three additional linear
models were calculated with slight alterations to the model above:
(1) sex was excluded as a main factor and all interactions with sex were
removed from the model; (2) sex was excluded as a main factor but
interactions with sex were kept in the model; (3) sex was included
as a main factor but all interactions with sex were removed from
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
the model. No clear difference in the number of candidate peptides
for the cohort effects resulted from this, suggesting that we are not
inferring a bias for smaller P values for the cohort variable (Extended
Data Fig. 3a).
Intensity variation of peptides in the trypsin-only samples results
from effects other than protein accessibility changes, such as variation
in protein abundance, protein isoform changes, or post-translational
modifications. To model those effects, we created a corresponding
model for the trypsin-only data, including (as above) two-way interac-
tions between age (A), sex (S), and cohort (C):
Pep
to
=β
0
+β
1
A+β
2
C+β
3
S+β
4
AC+β
5
AS+
β6CS+β7Protto +β8TPC +β9Batch
The coefficients of the peptide variation model were evaluated
using the same analysis as that applied to the coefficients of the LiP
model. This model can also be adapted to model the half-tryptic pep-
tides from the LiP data. Because there are no corresponding peptides
in the trypsin-only data, this term is removed, leading to the model
shown above (Supplementary Tables 4 and 8).
Last, we also modeled protein abundances. These were fit to
batch, TPC, age, cohort, sex, and the two-way interactions of the
last three:
Prot
to
=β
0
+β
1
A+β
2
C+β
3
S+β
4
AC+
β5AS+β6CS+β7TPC +β8Batch
The evaluation was again performed using t statistics. Because
protein abundances were used, no protein-wise P value adjustment
was necessary (Supplementary Tables 4 and 9).
Estimating scaled residuals to relect the impact of the cohort
group membership on a single LiP peptide. Peptides with a sig-
nificant difference of the cohort coefficient in the LiP model are
predictive of the disease status of a sample. To actually predict the
disease status of a sample given the signal of a specific peptide, we
first estimated the expected intensity of that peptide if the per-
son was healthy. By computing the expected intensity, we correct
for confounders such as age, sex, or batch. We then compared the
observed intensity with the expected intensity in order to predict
the disease status.
We used the coefficients of the peptide-specific linear model to
estimate the expected intensity of the LiP peptide P
HG
in sample i if the
donor was healthy:
P
HG,i
=β
0
+β
1
A
i
+β
2
HG +β
3
S
i
+β
4
A
i
HG +β5AiSi+β6HG Si+β7Pepto,i+
β8Protto,i+β9TPCi+β10 Batchi
Because the cohort variable is 0 for HG samples, the formula can
also be re-written as:
P
HG,i
=β
0
+β
1
A
i
+β
3
S
i
+β
5
A
i
S
i
+
β7Pepto,i+β8Protto,i+β9TPCi+β10 Batchi
Subsequently, by subtracting the predicted intensity for a healthy
sample P
HG,i
from the observed LiP intensity of the sample Pep
LiP, i
the
residual (Ri) was retrieved:
Ri=PepLiP,iPHG,i
The healthy samples result in residuals that are, on average, closer
to zero than the PD samples. Because our model contains two-way
interactions, an additional scaling step was applied. The two-way
interactions not only include information about the cohort group but
also about the age and the sex of the PD donors. If we completely
remove the two-way interaction terms from the predictions, the residu-
als estimated for the PD donors will also include variation due to the
age and sex of the samples. At the same time, this is not the case for the
healthy samples (because ‘healthy’ was encoded as 0, any interaction
with ‘healthy’ will be 0). To prevent this difference between predictions
for HG and PDG samples, we also computed the expected intensity if
the sample came from a PD individual (
PT0,i
) and then scaled the residu-
als by the difference between the expected intensities assuming healthy
versus diseased (that is,
PPDG,iPHG,i
):
P
PDG,i
=β
0
+β
1
A
i
+β
2
PDG +β
3
S
i
+β
4
A
i
PDG +β5AiSi+β6PDG Si+β7
Pepto,i+β8Protto,i+β9TPCi+β10 Batchi
thus
PPDG,iPHG,i=β2PDG +β4AiPDG +β6PDG Si
and
RSi=
Pep
LiP,i
P
HG,i
PPDG,iPHG,i
Note that this scaling also corrects for the direction (sign) of the
cohort effect: after the scaling, residuals will be positive (on average)
for samples from individuals with PD, irrespective of whether β2, β4,
and β6 are positive or negative. It is possible that, owing to specific
combinations of age and sex, the predicted intensities for HG and
PDG are almost identical, thus
PPDG,iPHG,i
is numerically close to zero.
Without any limitation on the denominator, this would lead to very
large scaled residuals (RSi). Therefore, samples for which the differ-
ence between these two predictions was smaller than the s.d. of the
difference over all samples were instead scaled by the s.d. while pre-
serving the algebraic sign. Taken together, this approach is a com-
promise between accounting for personal confounders, such as age
and sex, while at the same time avoiding extreme outlier prediction
values. In case of a predictive peptide, scaled residuals of HG samples
will be close to 0, and scaled residuals of PDG samples will be
close to +1.
Gene Ontology enrichment analysis. To obtain an overview of the
proteins detected in our CSF samples, we performed a GO enrichment
analysis using all proteins we detected as the foreground distribu-
tion and all proteins we searched for as the background distribution
(UniProt FASTA, July 2019). We performed GO enrichment using the
classic algorithm of topGO (version 2.36.0)
108
. The minimum node
size was set to 10.
We additionally performed GO enrichment analysis for all proteins
containing at least one biomarker candidate peptide (P ≤ 0.05). This
set of proteins could be defined as structurally affected because the
protein-wise FDR was already applied to the P values of the candidate
biomarker peptides. All proteins for which we assess structural varia-
tion (that is, all proteins with at least one peptide included in the LiP
model) were used as the background distribution. GO enrichment
was then performed using the weight01 algorithm of topGO, and the
minimum node size was set to 10.
Classifying healthy and Parkinson’s disease samples using LASSO
models. We used fivefold cross-validation to test how well healthy and
PD samples could be classified in independent test datasets. The data
were randomly divided into five folds, with HG and PDG samples evenly
distributed between the individual folds. Then, a training dataset
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
was defined containing four of the folds, and the last fold was used as
the test dataset. Peptides that were not measured in all samples were
excluded from this analysis. We then trained the LiP model (which
models LiP peptide intensity variation as a function of several factors,
as described above) on the training dataset and selected peptides
with a significant cohort effect (P<0.05; after protein-wise FDR; see
above). Subsequently, we estimated scaled residuals of the candidate
biomarker peptides for both the training and the test datasets based
on the linear model estimated from the training data.
The scaled residuals of the training dataset were then used to build
logistic regression LASSO models (R package glmnet (version 4.1-1
(ref. 109)) for classifying healthy and PD samples. For this step, missing
values in the scaled residuals were imputed using the mean scaled
residual of the respective training or test dataset over all samples. The
regularization parameter λ was chosen such that LASSO models using
one, five, or ten predictors were built. If multiple alphas were possible,
nested cross-validation was used to determine the best option. In case
λ could not be chosen, such that models with 5 or 10 predictors are esti-
mated, the closest possible option was chosen, hence models with 4/6
and 9/11 predictors were calculated. Thus, we tested how well disease
status could be predicted using individual peptides or combinations
of peptides. Missing values were imputed as the mean value of the
respective peptide (that is, assuming no effect).
This approach was applied five times, iteratively using all five folds
as the test dataset once. Additionally, the separation into five folds was
also repeated five times to make the results more stable, resulting in 25
models per number of defined predictors. For further interpretation of
the data, the log (odds) were estimated. Scores for each sample resulting
from models with the same number of predictors were then summarized
by estimating the mean, leading to a weighted ensemble approach.
In order to obtain log (odds ratios) based on the oligomeric/total
α-synuclein ratio that were comparable to the log (odds ratios) origi-
nating from the peptide models, we performed logistic regression
of the disease status versus the oligomeric/total α–synuclein ratio
in our cohort while applying the same cross-validation scheme as
above. Samples for which no α-synuclein measurements were available
were removed from the α-synuclein models. Thus, those models were
trained on a slightly smaller cohort than were the models not using
α-synuclein levels.
The same modeling approach was taken to predict disease status
using the trypsin-only intensities, as well as the protein abundances.
Additionally, alternative linear models fitting LiP intensities were
built. The first did not include the trypsin-only peptides intensities
or the protein abundances. Hence, not only are structural variations
fitted, but also PK-independent peptide variation as well as protein
abundance variation, resulting in:
Pep
LiP
=β
0
+β
1
A+β
2
C+β
3
S+β
4
AC+
β5AS+β6CS+β9TPC +β10 Batch
Also, models including only one of trypsin-only peptide intensities
or protein abundances as coefficients were estimated, hence:
Pep
LiP
=β
0
+β
1
A+β
2
C+β
3
S+β
4
AC+β
5
AS+
β6CS+β7Pepto +β9TPC +β10 Batch
and
Pep
LiP
=β
0
+β
1
A+β
2
C+β
3
S+β
4
AC+β
5
AS+
β6CS+β7Protto +β9TPC +β10 Batch
This leads to three additional types of linear models used for mod-
eling LiP peptides, including or excluding different combinations of
PK-independent peptides and protein abundance variation from the
fitted value.
We note that the used cross-validation scheme guards against
overfitting, because model performance is only evaluated with data or
samples not used for model estimation. In addition, the models using
LiP intensities and trypsin-only intensities were trained using the same
set of peptides so that any overfitting due to the number of peptides
(potential predictors) would be expected in both models – hence a
superior performance of the structural model cannot be based on this.
Inferring structural variation in Parkinson’s disease in brain
data
To assess structural variation between the HG and PDG in the brain data,
a linear modeling approach as for the CSF data was applied. Normalized
LiP peptide intensities (Pep
LiP
) of all fully tryptic peptides were mod-
eled as a function of various factors, including the protein abundance
(Prot
to
) and peptide intensities of the trypsin-only samples (Pep
to
).
Additionally, the cohort group membership (C) was modeled as a fac-
tor using dummy coding. Age and sex were not modeled, because all
donors were female and between 81 and 84. All samples were processed
in one batch, and all samples were adjusted to the same starting protein
concentration, so there was no need to correct for TPC. The following
linear model results for the brain data:
PepLiP =β0+β1C+β2Pepto +β3Protto
The statistical significance of coefficients (contrasts) was
determined using t statistics (two-tailed test of hypothesis), again
using the same approach as for the CSF data. When validating can-
didates previously found in the CSF, these P values were used to
compare the brain with the CSF data, that is without correcting for
the number of peptides per protein. For defining overall structural
changes between health and PD in the brain data, we determined the
protein-wise FDR.
To infer intensity variation of peptides in the trypsin-only samples
resulting from effects other than protein accessibility changes, models
for the trypsin-only data were created:
Pepto =β0+β1C+β2Protto
The coefficients of the peptide variation models were evaluated
using the same analysis as applied to the coefficients of the LiP models.
(Data corresponding to candidate CSF biomarker peptides, which
were also analyzed in the brain data, can be found in Supplementary
Table 11.)
Additionally, protein abundances were modeled:
Protto =β0+β1C
The evaluation was again performed using t statistics. Because
protein abundances were used, no protein-wise P value adjustment
was necessary.
In vitro experiments using purified proteins
Limited proteolysis for in vitro experiments. Aldolase A human
(SRP6370), -fructose 1,6-bisphosphate trisodium salt hydrate
(F6803), and 1α,25-dihydroxyvitamin D3 (D1530) were purchased
from Sigma Aldrich. Native human vitamin-D-binding protein
(ab90920) was purchased from Abcam. Fructose bisphosphate aldo-
lase and -fructose 1,6-bisphosphate trisodium salt hydrate was resus-
pended in LiP-buffer (100mM HEPES, 150mM KCl, 1mM MgCl2, pH
7.4) to a final concentration of 20µM and 20mM (1:1,000 molar ratio),
respectively. Vitamin-D-binding proteins and 1α,25-dihydroxyvitamin
D3 were resuspended in LiP buffer to a final concentration of 15µM
and 50µM (1:3.3 molar ratio), respectively. For each in vitro experi-
ment, the purified protein and substrate were mixed and incubated
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
for 2minutes at 25°C. The subsequent LiP and trypsin-digestion steps
were performed as for the CSF and brain samples, described above.
Liquid chromatography–mass spectrometry methods for in vitro
experiments. The in vitro experiments the samples were measured
in DIA mode. The acquisition settings were identical to the ones used
for the brain samples, except the gradient length was reduced to
60minutes.
Data analysis of in vitro experiments. The DIA raw files were
searched using the default directDIA pipeline of Spectronaut
15.6.211220.50606 (Biognosys) against the human UniProt FASTA
(downloaded in March 2020), including the sequence of proteinase
K. The digestion specificity was changed to semi-tryptic, and all
other settings were set to default. Peptide precursor abundance
comparison between treated and untreated conditions was done
using a moderated t test and Benjamini–Hochberg adjustment after
median normalization (Supplementary Table 6). The significantly
changed peptides were mapped on the 3D structure using the R
package protti (0.2.0)110.
Reporting summary
Further information on research design is available in the Nature
Research Reporting Summary linked to this article.
Data availability
The mass spectrometry proteomics dataset generated in this study
is available in the PRIDE database
111
(accession number PXD034120).
Source data are provided with this paper.
Code availability
Code for the main analyses (Figs. 2b, 3, and 5) has been
deposited on GitHub at https://github.com/beyergroup/
Global- analyses-of-the-human-structural-proteome-to-identify-
a-new-type-of-disease-biomarker. Further code for plots and other
analyses is available upon request. Supplementary Table 12 contains
all necessary data to use with the provided scripts.
References
80. Hughes, A. J., Daniel, S. E., Kilford, L. & Lees, A. J. Accuracy
of clinical diagnosis of idiopathic Parkinson’s disease:
aclinico-pathological study of 100 cases. J. Neurol. Neurosurg.
Psychiatry 55, 181–184 (1992).
81. Hoehn, M. M. & Yahr, M. D. Parkinsonism: onset, progression, and
mortality. Neurology 17, 427–442 (1967).
82. Fahn, S. et al. The Uniied Parkinson’s Disease Rating Scale. In
Fahn, S. et al. (eds.) Recent Developments in Parkinson’s Disease,
Vol. 2, 153-163 (1987).
83. Roth, M. et al. CAMDEX. A standardised instrument for the
diagnosis of mental disorder in the elderly with special
reference to the early detection of dementia. Br. J. Psychiatry 149,
698–709 (1986).
84. Alafuzo, I. et al. Staging/typing of Lewy body related
alpha-synuclein pathology: a study of the BrainNet Europe
Consortium. Acta Neuropathol. 117, 635–652 (2009).
85. Thal, D. R. et al. Sequence of Aβ-protein deposition in the
human medial temporal lobe. J. Neuropathol. Exp. Neurol. 59,
733–748 (2000).
86. Alafuzo, I. et al. Staging of neuroibrillary pathology in
Alzheimer’s disease: a study of the BrainNet Europe Consortium.
Brain Pathol. 18, 484–496 (2008).
87. Mirra, S. S. et al. The consortium to establish a registry for
Alzheimer’s disease (CERAD). Part II. Standardization of the
neuropathologic assessment of Alzheimer’s disease. Neurology
41, 479–486 (1991).
88. Montine, T. J. et al. National institute on aging-Alzheimer’s
Association guidelines for the neuropathologic assessment of
Alzheimer’s disease: a practical approach. Acta Neuropathol. 123,
1–11 (2012).
89. Thal, D. R., Griin, W. S. T., de Vos, R. A. I. & Ghebremedhin, E.
Cerebral amyloid angiopathy and its relationship to Alzheimer’s
disease. Acta Neuropathol. 115, 599–609 (2008).
90. Kovacs, G. G. et al. Aging-related tau astrogliopathy (ARTAG):
harmonized evaluation strategy. Acta Neuropathol. 131,
87–102 (2016).
91. Teunissen, C. E. et al. A consensus protocol for the
standardization of cerebrospinal luid collection and biobanking.
Neurology 73, 1914–1922 (2009).
92. Bruderer, R. et al. Optimization of experimental parameters in
data-independent mass spectrometry signiicantly increases
depth and reproducibility of results. Mol. Cell. Proteom. 16,
2296–2309 (2017).
93. Reiter, L. et al. Protein identiication false discovery
rates for very large proteomics data sets generated by
tandem mass spectrometry*. Mol. Cell. Proteom. 8,
2405–2417 (2009).
94. Savitski, M. M., Wilhelm, M., Hahne, H., Kuster, B. & Bantsche,
M. A scalable approach for protein false discovery rate estimation
in large proteomic data sets. Mol. Cell. Proteom. 14, 2394–2404
(2015).
95. Muntel, J. et al. Surpassing 10,000 identiied and quantiied
proteins in a single run by optimizing current LC–MS
instrumentation and data analysis strategy. Mol. Omics 15,
348–360 (2019).
96. Bruderer, R. et al. Extending the limits of quantitative proteome
proiling with data-independent acquisition and application to
acetaminophen-treated three-dimensional liver microtissues.
Mol. Cell. Proteom. 14, 1400–1410 (2015).
97. Teo, G. et al. MapDIA: preprocessing and statistical analysis of
quantitative proteomics data from data independent acquisition
mass spectrometry. J. Proteom. 129, 108–120 (2015).
98. Hui, W., Gel, Y. R. & Gastwirth, J. L. Lawstat: an R package
for law, public policy and biostatistics. J. Stat. Softw. 28,
1–26 (2008).
99. Ameijeiras, J., Rosa, A., Crujeiras, M. & Rodríguez-Casal, A.
multimode: an R package for mode assessment. J. Stat. Softw. 97,
1–32 (2018).
100. Buchan, D. W. A. & Jones, D. T. The PSIPRED Protein
Analysis Workbench: 20 years on. Nucleic Acids Res. 47,
W402–W407 (2019).
101. Bonet, J., Harteveld, Z., Sesterhenn, F., Scheck, A. & Correia,
B. E. Rstoolbox—a Python library for large-scale analysis of
computational protein design data and structural bioinformatics.
BMC Bioinform. 20, 240 (2019).
102. Zhao, B. et al. DescribePROT: database of amino acid-level
protein structure and function predictions. Nucleic Acids Res. 49,
D298–D308 (2021).
103. Peng, K., Radivojac, P., Vucetic, S., Dunker, A. K. & Obradovic,
Z. Length-dependent prediction of protein intrinsic disorder. BMC
Bioinform. 7, 208 (2006).
104. Faraggi, E., Zhou, Y. & Kloczkowski, A. Accurate single-sequence
prediction of solvent accessible surface area using local and
global features. Proteins 82, 3170–3176 (2014).
105. Yan, J. & Kurgan, L. DRNApred, fast sequence-based method that
accurately predicts and discriminates DNA-and RNA-binding
residues. Nucleic Acids Res. 45, e84 (2017).
106. Szklarczyk, D. et al. STRING v11: protein-protein association
networks with increased coverage, supporting functional
discovery in genome-wide experimental datasets. Nucleic Acids
Res. 47, D607–D613 (2019).
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
107. Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28,
235–242 (2000).
108. Alexa, A. & Rahnenfuhrer, J. topGO: Enrichment analysis for gene
ontology. Bioconductor R package (2020).
109. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for
generalized linear models via coordinate descent. J. Stat. Softw.
33, 1–22 (2010).
110. Quast, J.-P., Schuster, D. & Picotti, P. protti: an R package for
comprehensive data analysis of peptide- and protein-centric
bottom-up proteomics data. Bioinform. Adv. 2, 1 (2022).
111. Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a
hub for mass spectrometry-based proteomics evidences. Nucleic
Acids Res. 50, D543–D552 (2022).
112. Chen, H. M., Lin, C. Y. & Wang, V. Amyloid P component as a
plasma marker for Parkinson’s disease identiied by a proteomic
approach. Clin. Biochem. 44, 377–385 (2011).
113. Dong, M. X. et al. Serum butyrylcholinesterase activity: A
biomarker for Parkinson’s disease and related dementia. Biomed
Res. Int. 2017, 1524107 (2017).
Acknowledgements
We gratefully acknowledge all individuals who donated samples used in
this project. We thank: K. van Dijk and L. Oosterveld for help collecting
CSF samples and clinical datasets; N. Majbour and O. El-Agnaf for
collection of the alpha-synuclein datasets; and the Netherlands Brain
Bank for postmortem brain tissue samples. M.-T. M. was supported by
a long-term EMBO postdoctoral fellowship (ALTF 522-2019). L. N. was
funded by DFG (grant CRC 1310 and grant agreement no. 398882498)
and the German Academic Exchange Service (Forschungsstipendium
fuer Doktorandinnen und Doktoranden). J. G. was funded by DFG (grants
CRC 680 and CRC 1310). A. B. acknowledges funding by DFG (grant
CRC 1310 and grant agreement no. 398882498). P. P. was funded by
a Personalized Health and Related Technologies (PHRT) grant (PHRT-
506), a Sinergia grant from the Swiss National Science Foundation
(SNSF grant CRSII5_177195), the Peter Bockho Stiftung and the ETH
Zurich Foundation, Parkinson Schweiz, the European Research Council
(866004), and the EPIC-XS Consortium (823839), the last two under
the EU Horizon 2020 program. W. D. J. v. d. B. was inancially supported
by grants from Amsterdam Neuroscience, Dutch Research council
(ZonMW 70-73305-98-106; 70-73305-98-102; 40-46000-98-101),
Michael J. Fox foundation (17253), and Dutch Parkinson Association
(2020-G01). Some igures were created with BioRender.com.
Author contributions
P. P. conceived the project. M.-T. M. conceived the experimental pipeline
with input from P. P. and A. B. M.-T. M., A. B. and P. P. designed the
experiments. M.-T. M. performed the experiments. M.-T. M., L. N. and
F. S. analyzed the data. J. M., R. B., L. R. and W. D. J. v. d. B. collected
the data. P. S. and M.-T. M. designed and analyzed in vitro experiments.
W. D. J. v. d. B. provided the clinical samples. L. N. and J. G. performed
the statistical analysis with input from A.B. N. d. S. supervised writing
of the manuscript. M.-T. M., L. N., F. S., N. d. S., A. B. and P. P. wrote the
manuscript. A. B. and P. P. supervised the project. All authors discussed
and revised the inal manuscript prior to submission.
Competing interests
The authors RB, JM and LR are full-time employees of Biognosys AG
(Zurich, Switzerland). Spectronaut is a trademark of Biognosys AG.
PP is an inventor of a patent licensed by Biognosys AG that covers
the LiP–MS method used in this manuscript. WvdB performed
contract research and consultancy for Homann-La Roche, Roche
Tissue Diagnostics, Crossbeta Sciences, Discoveric Bio and received
research consumables from Homann-La Roche and Prothena. The
remaining authors declare no competing interests.
Additional information
Extended data is available for this paper at
https://doi.org/10.1038/s41594-022-00837-0.
Supplementary information The online version
contains supplementary material available at
https://doi.org/10.1038/s41594-022-00837-0.
Correspondence and requests for materials should be addressed to
Andreas Beyer or Paola Picotti.
Peer review information Nature Structural and Molecular Biology
thanks Tiago Outeiro, Marcus Bantsche, and Laura Parkkinen for
their contribution to the peer review of this work. Primary Handling
editors: Anke Sparmann and Florian Ullrich, in collaboration with the
Nature Structural & Molecular Biology team. Peer reviewer reports
are available.
Reprints and permissions information is available at
www.nature.com/reprints.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Extended Data Fig. 1 | Global characteristics of study population.
a, Characteristics of the study cohort. P values were estimated via Wilcoxon
rank sum test. b, Age distribution within the healthy (HG) and PD (PDG) cohort
groups, separated by sex. Boxplots: median, center; first and third quantile,
lower and upper hinges; largest/smallest value no further than 1.5 * inter-quantile
range of the hinge, whiskers; data points beyond are defined as outliers and
plotted individually. P values are indicated (Wilcoxon rank sum test, n=51
subjects in HG, n=52 subjects PDG). c, GO enrichment of all identified proteins
in the CSF proteome (trypsin-only control data) using the human proteome
(UniProt FASTA, July 2019) as the background. Only the 10 terms with the highest
enrichment per GO domain are shown. Numerical data for graphs in b and c are
available as source data.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Extended Data Fig. 2 | Comparison of structural features of variable and
non-variable peptides, and the proteins containing these peptides, in CSF.
a, Distribution of secondary structures as loops, alpha-helix and beta-strands
for variable/non-variable peptides, as predicted using PSIPRED. Boxplots for all
panels: median, center; first and third quantile, lower and upper hinges; largest/
smallest value no further than 1.5 * inter-quantile range of the hinge, whiskers;
data points beyond are defined as outliers and plotted individually. P values are
indicated (Wilcoxon rank sum test; ns, not significant; 9385 non-variable, 386
medium-variable, 117 high-variable peptides, from 51 subjects) b, c, Predicted
propensity of peptides to bind DNA or RNA. d, Predicted solvent accessible
surface area of variable/non-variable peptides. e, Number of high confidence
interaction in STRING for proteins with at least one highly variable peptide (red),
as compared to all other proteins (gray). f, g, Sequence length and number of
domains as annotated in the PFAM database for proteins with at least one highly
variable peptide (red), as compared to all other proteins (gray). h, Side-by-
side view of affected peptides from in situ (left, reproduced from Fig. 2g for
comparison) and in vitro experiments (right). Structure of human brain fructose
bisphosphate aldolase (PDB entry 1XFB). The enzyme is a homotetramer, one
subunit is shown as light blue cartoon and the other 3 subunits are shown as gray
surface. The substrate is represented as yellow spheres, based on an alignment of
PDB entry 1XFB with the ligand bound structure of the muscle isoform (PDB entry
4ALD). For the in situ data (left), the highly variable peptides are highlighted
in dark red (bimodal) and salmon (unimodal). For the in vitro data (right), the
significant peptides in the presence and absence of fructose 1,6-bisphosphate
are highlighted in red (log2 fold change<−1 or > 1). The bimodal and non-bimodal
peptide identified in situ are encircled. Numerical data for graphs in a-g are
available as source data.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Extended Data Fig. 3 | Effects of the sex variable on the linear model and
overlap between the brain and CSF data sets. a, The histogram visualizes the
P values (calculated via t-statistics) of the cohort variable estimated from
the linear model describing effects of structural variation, with the indicated
combinations of the sex variable and interactions with sex taken into account.
For all models, the first bar (extreme left) indicates significant (<0.05) P values.
b, Number of proteins and peptides of the CSF and brain samples used for
estimating the linear models, visualizing the overlapping peptides and proteins
between the two tissues. c, Number of candidate peptide from the CSF where
the coefficients change in the same or different direction in the brain samples.
Numerical data for graphs in a and c are available as source data.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Extended Data Fig. 4 | See next page for caption.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Extended Data Fig. 4 | Structural changes in selected CSF and brain proteins.
a, b, Structure of PSAT1 (PDB 3E77) colored according to peptides in CSF (a)
and brain (b) data. PSAT catalyzes an important step in serine biosynthesis.
Black indicates all analyzed peptides; red indicates the candidate peptide (a) or
peptides with a significant P value (b). One candidate CSF peptide is only 6.0Å
away from the active site. c, Coverage plot for all analyzed PSAT1 peptides in
CSF (top) and brain (bottom). Black represents fully tryptic; gray represents
half tryptic peptides and red bars the candidate/significant peptide in the CSF/
brain. d, e, Structure of PRDX6 (PDB PRX1) colored according to peptides in CSF
(d) and brain (e) data. Colors as in a, b. f, Coverage plot for all analyzed PRDX6
peptides in CSF (top) and brain (bottom). Colors as in c. g, Structure of AFM
(PDB 5OKL). h, Coverage plot for AFM. AFM level in serum exosomes was linked
to PD progression4. i, Scaled residual plots for AFM. Boxplots are as in Extended
Data Fig. 2 (n=51 subjects in each HG and PDG). j, Structure of SAMP (PDB entry
1GYK) as pentamer with bound calcium (yellow spheres), an abundance-based
plasma biomarker candidate for PD112. k, Coverage plot of SAMP. l, Scaled residual
plots for SAMP (ERVGEYSLYIGR: n=47 subjects in HG and n=50 subjects in PDG;
IVLGQEQDSYGGK: n=51 subjects in each HG and PDG). m, Structure of BCHE
(PDB entry 1P0I) in complex with butanoic acid (yellow spheres). Activity of this
enzyme is decreased in PD with dementia (PDD)113, note that BCH inhibition was
not used in our cohort. n, Coverage plot for BCHE. o, Scaled residual plot for
BCHE (NIAAFGGNPK: n=51 subjects in each HG and PDG; IFFPQVSEFGK: n=43
subjects in HG and n=48 subjects in PDG). Peptide colors in panels G, J, M are as
in D; colors in panels H, K, N are as in C. Numerical data for graphs in c, f, h, i, k, l, n
and o are available as source data.
Nature Structural & Molecular Biology
Article https://doi.org/10.1038/s41594-022-00837-0
Extended Data Fig. 5 | Classification of Parkinson’s Disease in CSF data.
a, ROC curves for classification of PD based on LiP peptide variation. In this
case, LiP peptide intensities were neither corrected for trypsin-only peptide
intensities nor for protein abundance. b, ROC curves for classification of PD
based on LiP peptide variation. In this case, LiP peptide intensities were not
corrected for protein abundance. c, ROC curves for classification of PD based
on LiP peptide variation. In this case, LiP peptide intensities were not corrected
for trypsin-only peptide intensities. d, ROC curves for classification of PD
based on ELISA measurement of different a-synuclein species from. e, Total
α-synuclein levels compared to the ratio of the oligomeric/total α-synuclein level
across the cohort. f, Oligomeric α-synuclein levels compared to the ratio of the
oligomeric/total α-synuclein level across the cohort. For (e) and (f), each dot
represents a single individual. g, Comparison of classification of the PDG using
the ratio of oligomeric to total a-synuclein (log odds plotted on x axis) and using
a combination of five LiP peptide levels (log odds plotted on y axis). Each point
represents an individual and the HY-stage is indicated by color. Numerical data
for graphs in a-g are available as source data.
... 2 protein interactions 17 , aging-related changes in yeast 18 , Caenorhabditis elegans 19 and mice 20 as well as the use of multi-dimensional protein-structural changes as a new class of disease biomarkers 21 . To quantify changes in the structural protein accessibility between conditions, lysates from all conditions undergo short (i.e. ...
... We additionally utilized a human cerebrospinal fluid (CSF) dataset containing 52 samples from healthy individuals each measured with LiP and TrP digestion 21 . This human data set differs from the yeast data, in that the samples contain more variation due to study variables, such as age, sex or environment, and have a much more complex and varied genetic background. ...
... Human CSF samples were prepared and analyzed as described by Mackmull et al. 21 . The results from the performed Spectronaut search were exported using the LiPAnalyzeR_SpectroScheme format. ...
Preprint
Full-text available
Limited proteolysis combined with mass spectrometry (LiP-MS) facilitates probing structural changes on a proteome-wide scale in situ . Distinguishing the different signal contributions, such as changes in protein abundance, from protein abundance changes remains challenging. We propose a two-step approach, first removing unwanted variations from the LiP signal that are not caused by protein structural effects and subsequently inferring the effects of variables of interest on the remaining signal. Using LiP-MS data from three species we demonstrate that our framework provides a uniquely powerful approach for deconvolving LiP-MS signals and inferring protein structural changes.
... In LiP-MS experiments, the use of broad-specificity proteases under native conditions dictates cleavage by the structural features of the protein (Feng et al, 2014). Therefore, the identification of variations in LiP-MS profiles allows the recognition of protein regions implicated in structural rearrangements (Schopper et al., 2017), as demonstrated in the analysis of cell lysates (Di Michele et al, 2015;Liu & Fitzgerald, 2016;Zuo et al, 2021), human plasma samples, and cerebrospinal fluid (Mackmull et al, 2022;Shuken et al, 2022;Yang et al, 2015), which can provide information regarding pathological conditions and guide biomarker identification (Mackmull et al., 2022). ...
... In LiP-MS experiments, the use of broad-specificity proteases under native conditions dictates cleavage by the structural features of the protein (Feng et al, 2014). Therefore, the identification of variations in LiP-MS profiles allows the recognition of protein regions implicated in structural rearrangements (Schopper et al., 2017), as demonstrated in the analysis of cell lysates (Di Michele et al, 2015;Liu & Fitzgerald, 2016;Zuo et al, 2021), human plasma samples, and cerebrospinal fluid (Mackmull et al, 2022;Shuken et al, 2022;Yang et al, 2015), which can provide information regarding pathological conditions and guide biomarker identification (Mackmull et al., 2022). ...
Preprint
Full-text available
Diverse proteomics-based strategies have been applied to saliva to quantitatively identify diagnostic and prognostic targets for oral cancer. Considering that these potential diagnostic and prognostic factors may be regulated by events that do not imply variation in protein abundance levels, we investigated the hypothesis that changes in protein conformation can be associated with diagnosis and prognosis, revealing biological processes and novel targets of clinical relevance. For this, we employed limited proteolysis-mass spectrometry in saliva samples to explore structural alterations, comparing the proteome of healthy control and oral squamous cell carcinoma (OSCC) patients, with and without lymph node metastasis. Fifty-one proteins with potential structural rearrangements were associated with clinical patient features. Post-translational modifications, such as glycosylation, disulfide bond, and phosphorylation, were also investigated in our data using different search engines and in silico analysis indicating that they might contribute to structural rearrangements of the potential diagnostic and prognostic markers here identified. Altogether, this powerful approach allows for a deep investigation of complex biofluids, such as saliva, advancing the search for targets for oral cancer diagnosis and prognosis. Graphical Abstract Oral cancer progression is associated with potential structural rearrangements.
... (such as epitope sites in antigens) using protease restriction or covalent labeling to identify exposed regions 15 of proteins. 8 The data generated in protein footprinting experiments is often low resolution, but the potential 16 scale of experiments has made it an attractive method. In 2010 West et al. showed proteome-scale 17 ...
... folding in cell lysates and biofluids. 14,15,5,16 In 2015 Espino et al. used lasers to activate hydroxy radicals in 21 vivo to label proteins, providing the first attempt to footprint an intact cell. 17 Their approach has now been 22 ...
Preprint
Full-text available
Mass spectrometry-based methods can provide a global expression profile and structural readout of proteins in complex systems. Preserving the in vivo conformation of proteins in their innate state is challenging during proteomic experiments. Here, we introduce an approach using perfusion of reagents to create a whole animal in vivo protein footprinting method that adds dimethyl labels to exposed lysine residues on intact proteins to maintain information on protein conformations. When this approach was used to measure dynamic structural changes during Alzheimer’s disease (AD) progression in a mouse model, we detected 433 proteins that underwent structural changes attributed to AD, independent of aging, across 7 tissues. We identified structural changes of co-expressed proteins and linked the communities of these proteins to their biological functions. Our findings show that structural alterations of proteins precede changes in expression, thereby showing the value of in vivo protein conformation measurement. Our method represents a new strategy for untangling mechanisms of proteostasis dysfunction caused by protein misfolding. In vivo whole-animal footprinting should have broad applicability for discovering conformational changes in systemic diseases and therapeutic interventions.
... In the latter case, a key mechanism consists of an altered interaction of the protein with lipid membranes, which leads to α-Syn oligomerization [148] and to the subsequent formation of complex aggregates including fibrillar α-Syn and fragments of altered lipids membranes [40]. Understanding the interaction of α-Syn oligomers with lipid membranes can strongly have an impact on deciphering the molecular pathogenesis of synucleinopathies and might provide interesting insights for early diagnosis and disease-modifying therapies, by providing new possible strategies for discovering novel biomarkers based on a combined approach between proteomics and lipidomics [149]. ...
Article
Full-text available
The present review provides a comprehensive examination of the intricate dynamics between α-synuclein, a protein crucially involved in the pathogenesis of several neurodegenerative diseases, including Parkinson’s disease and multiple system atrophy, and endogenously-produced bioactive lipids, which play a pivotal role in neuroinflammation and neurodegeneration. The interaction of α-synuclein with bioactive lipids is emerging as a critical factor in the development and progression of neurodegenerative and neuroinflammatory diseases, offering new insights into disease mechanisms and novel perspectives in the identification of potential biomarkers and therapeutic targets. We delve into the molecular pathways through which α-synuclein interacts with biological membranes and bioactive lipids, influencing the aggregation of α-synuclein and triggering neuroinflammatory responses, highlighting the potential of bioactive lipids as biomarkers for early disease detection and progression monitoring. Moreover, we explore innovative therapeutic strategies aimed at modulating the interaction between α-synuclein and bioactive lipids, including the development of small molecules and nutritional interventions. Finally, the review addresses the significance of the gut-to-brain axis in mediating the effects of bioactive lipids on α-synuclein pathology and discusses the role of altered gut lipid metabolism and microbiota composition in neuroinflammation and neurodegeneration. The present review aims to underscore the potential of targeting α-synuclein-lipid interactions as a multifaceted approach for the detection and treatment of neurodegenerative and neuroinflammatory diseases.
... Gold standard methods currently used for biomarker identification and quantification in NDD research, such as mass spectrometry (MS) and enzyme-linked immunosorbent assay (ELISA), focus on quantifying the level of target proteins but insensitive to changes in their structural states and are thus not able to discriminate between different structural forms (20,21). Progress in proteomic methods such as limited proteolysis-based MS (LiP-MS) shows promise in identifying even the structural alterations of proteins in complex samples on a global scale (22,23). However, these MS-based techniques are in the early stages of development, and integration of them into clinical practices remains challenging. ...
... Gold standard methods currently used for biomarker identification and quantification in NDD research, such as mass spectrometry (MS) and enzyme-linked immunosorbent assay (ELISA), focus on quantifying the level of target proteins but insensitive to changes in their structural states and are thus not able to discriminate between different structural forms (20,21). Progress in proteomic methods such as limited proteolysis-based MS (LiP-MS) shows promise in identifying even the structural alterations of proteins in complex samples on a global scale (22,23). However, these MS-based techniques are in the early stages of development, and integration of them into clinical practices remains challenging. ...
Article
Full-text available
Diagnosis of neurodegenerative disorders (NDDs) including Parkinson's disease and Alzheimer's disease is challenging owing to the lack of tools to detect preclinical biomarkers. The misfolding of proteins into oligomeric and fibrillar aggregates plays an important role in the development and progression of NDDs, thus underscoring the need for structural biomarker-based diagnostics. We developed an immunoassay-coupled nanoplasmonic infrared metasurface sensor that detects proteins linked to NDDs, such as alpha-synuclein, with specificity and differentiates the distinct structural species using their unique absorption signatures. We augmented the sensor with an artificial neural network enabling unprecedented quantitative prediction of oligomeric and fibrillar protein aggregates in their mixture. The microfluidic integrated sensor can retrieve time-resolved absorbance fingerprints in the presence of a complex biomatrix and is capable of multiplexing for the simultaneous monitoring of multiple pathology-associated biomarkers. Thus, our sensor is a promising candidate for the clinical diagnosis of NDDs, disease monitoring, and evaluation of novel therapies.
Article
Proteomic profiling is an effective way to identify biomarkers for Parkinson’s disease (PD). Cerebrospinal fluid (CSF) has direct connectivity with the brain and could be a source of finding biomarkers and their clinical implications. Comparative proteomic profiling has shown that a group of differentially displayed proteins exist. The studies performed using conventional and classical tools also supported the occurrence of these proteins. Many studies have highlighted the potential of CSF proteomic profiling for biomarker identification and their clinical applications. Some of these proteins are useful for disease diagnosis and prediction. Proteomic profiling of CSF also has immense potential to distinguish PD from similar neurodegenerative disorders. A few protein biomarkers help in fundamental knowledge generation and clinical interpretation. However, the specific biomarker of PD is not yet known. The use of proteomic approaches in clinical settings is also rare. A large-scale, multi-centric, multi-population and multi-continental study using multiple proteomic tools is warranted. Such a study can provide valuable, comprehensive and reliable information for a better understanding of PD and the development of specific biomarkers. The current article sheds light on the role of CSF proteomic profiling in identifying biomarkers of PD and their clinical implications. The article also explains the achievements, obstacles and hopes for future directions of this approach.
Article
Full-text available
Cell membrane composed of lipid bilayer is a selectively permeable membrane that only allows specific molecules to pass through. While such selectivity is essential for the survival and function of cells, there exist instances when it is necessary to overcome the cell membrane barrier. Study on the artificial cell membrane barrier breakthrough strategies is of great significance for the development of drug delivery systems and the understanding of cellular behaviors. Herein, the advancements in the development of strategies for opening cell membrane barriers over the past decade are summarized. The main transmembrane mechanisms are elucidated and then divided into three categories, i.e., cell perforation via microinjection/external physical fields, cell endocytosis‐assisted construction of artificial transmembrane channels, and untethered micro/nanomachines. Next, the potential applications after opening the cell membrane are discussed, which mainly focus on the transmembrane cargo delivery into the cell and endogenous substance extraction out from the cell. Finally, this review outlines the current challenges that impede the realization of practical applications and presents an outlook of future opportunities to promote further development. Through overcoming the challenges, it is anticipated that artificial cell membrane breakthrough strategies will provide a revolutionized tool in the near future to advance the field of biomedicine and biotechnology.
Article
Full-text available
The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Article
The formation of soluble α-synuclein (α-syn) and amyloid-β (Aβ) aggregates is associated with the development of Parkinson’s disease (PD). Current methods mainly focus on the measurement of the aggregate concentration and are unable to determine their heterogeneous size and shape, which potentially also change during the development of PD due to increased protein aggregation. In this work, we introduce aptamer-assisted single-molecule pull-down (APSiMPull) combined with super-resolution fluorescence imaging of α-syn and Aβ aggregates in human serum from early PD patients and age-matched controls. Our diffraction-limited imaging results indicate that the proportion of α-syn aggregates (α-syn/(α-syn+Aβ)) can be used to distinguish PD and control groups with an area under the curve (AUC) of 0.85. Further, super resolution fluorescence imaging reveals that PD serums have a higher portion of larger and rounder α-syn aggregates than controls. Little difference was observed for Aβ aggregates. Combining these two metrics, we constructed a new biomarker and achieved an AUC of 0.90. The combination of the aggregate number and morphology provides a new approach to early PD diagnosis.
Article
Full-text available
We present a flexible, user-friendly R package called protti for comprehensive quality control, analysis and interpretation of quantitative bottom-up proteomics data. protti supports the analysis of protein-centric data such as those associated with protein expression analyses, as well as peptide-centric data such as those resulting from limited proteolysis-coupled mass spectrometry analysis. Due to its flexible design, it supports analysis of label-free, data-dependent, data-independent and targeted proteomics datasets. protti can be run on the output of any search engine and software package commonly used for bottom-up proteomics experiments such as Spectronaut, Skyline, MaxQuant or Proteome Discoverer, adequately exported to table format. Availability and implementation protti is implemented as an open-source R package. Release versions are available via CRAN (https://CRAN.R-project.org/package=protti) and work on all major operating systems. The development version is maintained on GitHub (https://github.com/jpquast/protti). Full documentation including examples is provided in the form of vignettes on our package website (jpquast.github.io/protti/).
Article
Full-text available
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Preprint
Full-text available
Midbrain dopamine (DA) neurons, a population of cells that are critical for motor control, motivated behaviors and cognition, release DA via an exocytotic mechanism from both their axonal terminals and their somatodendritic (STD) compartment. In Parkinson's disease (PD), it is striking that motor dysfunctions only become apparent after extensive loss of DA innervation. Although it has been hypothesized that this resilience is due to the ability of many motor behaviors to be sustained through a basal tone of DA and diffuse transmission, experimental evidence for this is limited. Here we conditionally deleted the calcium sensor synaptotagmin-1 (Syt1) in DA neurons (cKODA mice) to abrogate most activity-dependent axonal DA release in the striatum and mesencephalon, leaving STD DA release intact. Strikingly, Syt1 cKODA mice showed intact performance in multiple unconditioned DA-dependent motor tasks, suggesting that activity-dependent DA release is dispensable for such basic motor functions. Basal extracellular levels of DA in the striatum were unchanged, suggesting that a basal tone of extracellular DA is sufficient to sustain basic movement. We also found multiple adaptations in the DA system of cKODA mice, similar to those happening at early stages of PD. Taken together, our findings reveal the striking resilience of DA-dependent motor functions in the context of a near-abolition of phasic DA release, shedding new light on why extensive loss of DA innervation is required to reveal motor dysfunctions in PD
Article
Full-text available
Background : The role of cerebrospinal fluid (CSF) alpha-synuclein as a potential biomarker has been challenged mainly due to variable preanalytical measures between laboratories. To evaluate the impact of the preanalytical factors contributing to such variability, the different subforms of alpha-synuclein need to be studied individually. Method : We investigated the effect of exposing CSF samples to several preanalytical sources of variability: (1) different polypropylene (PP) storage tubes; (2) use of non-ionic detergents; (3) multiple tube transfers; (4) multiple freeze-thaw cycles; and (5) delayed storage. CSF oligomeric- and total-alpha-synuclein levels were estimated using our in-house sandwich-based enzyme-linked immunosorbent assays. Results : Siliconized tubes provided the optimal preservation of CSF alpha-synuclein proteins among other tested polypropylene tubes. The use of tween-20 detergent significantly improved the recovery of oligomeric-alpha-synuclein, while multiple freeze-thaw cycles significantly lowered oligomeric-alpha-synuclein in CSF. Interestingly, oligomeric-alpha-synuclein levels remained relatively stable over multiple tube transfers and upon delayed storage. Conclusion : Our study showed for the first-time distinct impact of preanalytical factors on the different forms of CSF alpha-synuclein. These findings highlight the need for special considerations for the different forms of alpha-synuclein during CSF samples’ collection and processing.
Article
Full-text available
Biological processes are regulated by intermolecular interactions and chemical modifications that do not affect protein levels, thus escaping detection in classical proteomic screens. We demonstrate here that a global protein structural readout based on limited proteolysis-mass spectrometry (LiP-MS) detects many such functional alterations, simultaneously and in situ, in bacteria undergoing nutrient adaptation and in yeast responding to acute stress. The structural readout, visualized as structural barcodes, captured enzyme activity changes, phosphorylation, protein aggregation, and complex formation, with the resolution of individual regulated functional sites such as binding and active sites. Comparison with prior knowledge, including other ‘omics data, showed that LiP-MS detects many known functional alterations within well-studied pathways. It suggested distinct metabolite-protein interactions and enabled identification of a fructose-1,6-bisphosphate-based regulatory mechanism of glucose uptake in E. coli. The structural readout dramatically increases classical proteomics coverage, generates mechanistic hypotheses, and paves the way for in situ structural systems biology.
Article
Full-text available
In early-stage Parkinson′s disease (PD), cognitive impairment is common, and a variety of cognitive domains including memory, attention, and executive functioning may be affected. Cerebrospinal fluid (CSF) biomarkers are potential markers of cognitive functioning. We aimed to explore whether CSF α-synuclein species, neurofilament light chain, amyloid-β42, and tau are associated with cognitive performance in early-stage PD patients. CSF levels of total-α-synuclein and phosphorylated-α-synuclein, neurofilament light chain, amyloid-β42, and total-tau and phosphorylated-tau were measured in 26 PD patients (disease duration ≤5 years and Hoehn and Yahr stage 1–2.5). Multivariable linear regression models, adjusted for age, gender, and educational level, were used to assess the relationship between CSF biomarker levels and memory, attention, executive and visuospatial function, and language performance scores. In 26 early-stage PD patients, attention and memory were the most commonly affected domains. A higher CSF phosphorylated-α-synuclein/total-α-synuclein ratio was associated with better executive functioning (sβ = 0.40). Higher CSF neurofilament light was associated with worse memory (sβ = −0.59), attentional (sβ = −0.32), and executive functioning (sβ = −0.35). Reduced CSF amyloid-β42 levels were associated with poorer attentional functioning (sβ = 0.35). Higher CSF phosphorylated-tau was associated with worse language functioning (sβ = −0.33). Thus, CSF biomarker levels, in particular neurofilament light, were related to the most commonly affected cognitive domains in early-stage PD. This indicates that CSF biomarker levels may identify early-stage PD patients who are at an increased risk of developing cognitive impairment.
Article
Full-text available
Background There are currently no treatments that stop or slow the progression of Parkinson's disease (PD). Case–control genome‐wide association studies have identified variants associated with disease risk, but not progression. The objective of the current study was to identify genetic variants associated with PD progression. Methods We analyzed 3 large longitudinal cohorts: Tracking Parkinson's, Oxford Discovery, and the Parkinson's Progression Markers Initiative. We included clinical data for 3364 patients with 12,144 observations (mean follow‐up 4.2 years). We used a new method in PD, following a similar approach in Huntington's disease, in which we combined multiple assessments using a principal components analysis to derive scores for composite, motor, and cognitive progression. These scores were analyzed in linear regression in genome‐wide association studies. We also performed a targeted analysis of the 90 PD risk loci from the latest case–control meta‐analysis. Results There was no overlap between variants associated with PD risk, from case–control studies, and PD age at onset versus PD progression. The APOE ε4 tagging variant, rs429358, was significantly associated with composite and cognitive progression in PD. Conditional analysis revealed several independent signals in the APOE locus for cognitive progression. No single variants were associated with motor progression. However, in gene‐based analysis, ATP8B2, a phospholipid transporter related to vesicle formation, was nominally associated with motor progression (P = 5.3 × 10⁻⁶). Conclusions We provide early evidence that this new method in PD improves measurement of symptom progression. We show that the APOE ε4 allele drives progressive cognitive impairment in PD. Replication of this method and results in independent cohorts are needed. © 2020 The Authors. Movement Disorders published by Wiley Periodicals LLC. on behalf of International Parkinson and Movement Disorder Society
Article
Full-text available
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Article
Full-text available
The Human Proteome Project (HPP) consortium aims to functionally characterize the dark proteome. On the basis of the relevance of olfaction in early neurodegeneration, we have analyzed the dark proteome using data mining in public resources and omics data sets derived from the human olfactory system. Multiple dark proteins localize at synaptic terminals and may be involved in amyloidopathies such as Alzheimer’s disease (AD). We have characterized the dark PITH domain-containing protein 1 (PITHD1) in olfactory metabolism using bioinformatics, proteomics, in vitro and in vivo studies, and neuropathology. PITHD1−/− mice exhibit olfactory bulb (OB) proteome changes related to synaptic transmission, cognition, and memory. OB PITHD1 expression increases with age in wild-type (WT) mice and decreases in Tg2576 AD mice at late stages. The analysis across 6 neurological disorders reveals that olfactory tract (OT) PITHD1 is specifically upregulated in human AD. Stimulation of olfactory neuroepithelial (ON) cells with PITHD1 alters the ON phosphoproteome, modifies the proliferation rate, and induces a pro inflammatory phenotype. This workflow applied by the Spanish C-HPP and Human Brain Proteome Project (HBPP) teams across the ON-OB-OT axis can be adapted as a guidance to decipher functional features of dark proteins. Data are available via ProteomeXchange with identifiers PXD018784 and PXD021634.
Article
Biological processes are regulated by intermolecular interactions and chemical modifications that do not affect protein levels, thus escaping detection in classical proteomic screens. We demonstrate here that a global protein structural readout based on limited proteolysis-mass spectrometry (LiP-MS) detects many such functional alterations, simultaneously and in situ, in bacteria undergoing nutrient adaptation and in yeast responding to acute stress. The structural readout, visualized as structural barcodes, captured enzyme activity changes, phosphorylation, protein aggregation, and complex formation, with the resolution of individual regulated functional sites such as binding and active sites. Comparison with prior knowledge, including other ‘omics data, showed that LiP-MS detects many known functional alterations within well-studied pathways. It suggested distinct metabolite-protein interactions and enabled identification of a fructose-1,6-bisphosphate-based regulatory mechanism of glucose uptake in E. coli. The structural readout dramatically increases classical proteomics coverage, generates mechanistic hypotheses, and paves the way for in situ structural systems biology.