Content uploaded by William Clarke
Author content
All content in this area was uploaded by William Clarke
Content may be subject to copyright.
Clin Chem Lab Med 2003; 41(12):1562–1570 © 2003 by Walter de Gruyter · Berlin · New York
William Clarke, Zhen Zhang and Daniel W. Chan*
Clinical Chemistry Division, Johns Hopkins Medical
Institutions, Baltimore, USA
The term “clinical proteomics” refers to the applica-
tion of available proteomics technologies to current
areas of clinical investigation. The ability to simultane-
ously and comprehensively examine changes in large
numbers of proteins in the context of disease or other
changes in physiological conditions holds great
promise as a tool to unlock the solutions to difficult
clinical research questions. Proteomics is a rapidly
growing field that combines high throughput analyti-
cal methodologies such as two-dimensional gel elec-
trophoresis and SELDI mass spectrometry methods
with complex bioinformatics to study systems biol-
ogy – the system of interest is defined by the investi-
gator. Even with all its potential, however, studies
must be carefully designed in order to differentiate
true clinical differences in protein expression from dif-
ferences originating from variation in sample collec-
tion, variation in experimental condition, and normal
biological variability. Proteomic analyses are already
widely in use for clinical studies ranging from cancer to
other diseases such as cardiovascular disease, organ
transplant, and pharmacodynamic studies. Clin Chem
Lab Med 2003; 41(12):1562–1570
Key words: Clinical proteomics; Cancer proteomics;
Mass spectrometry; 2-D Gel electrophoresis.
Abbreviations: 2DE, 2-D gel electrophoresis; BPH, be-
nign prostatic hyperplasia; CAD, coronary artery dis-
ease; CsA, cyclosporine A; CSF, cerebrospinal fluid;
ESI, electrospray ionization; FT-MS, Fourier-transform
mass spectroscopy; HUPO, Human Proteome Organi-
zation; LCM, laser capture microdissection; MALDI,
matrix-assisted laser desorption ionization; MS, mass
spectrometry; MS/MS, tandem mass spectrometry;
NAF, nipple aspirate fluid; PCA, prostate cancer; PSA,
prostate specific antigen; SELDI, surface-enhanced
laser desorption ionization; TOF, time-of-flight; UMSA
Unified Maximum Separability Analysis.
Introduction
The term “proteome” was introduced nearly ten years
ago by Marc Wilkins to describe “all proteins ex-
pressed by a genome or tissue” (1). An alternative but
similar definition of a proteome is a set of all expressed
proteins in a cell, tissue or organism (2). Proteomics,
then, can be described as a systematic analysis of pro-
teins within a defined system regarding their identity,
quantity, and function (3). Clinical proteomics is a
rapidly growing field of increasing importance that
demonstrates promise for the identification of new tar-
gets for treatment and therapeutic intervention, as well
as biomarkers for diagnosis, prognosis, and therapeu-
tic efficacy by using current technology to compare
proteome profiles between differing physiological and
disease states (4, 5).
The main difference between genomics and pro-
teomics is that the genome is a static entity that goes
relatively unchanged from day to day, while the pro-
teome is a dynamic collection of proteins that demon-
strate variation between individuals, between cell
types, and between entities of the same type but under
different pathological or physiological conditions (1).
The states of proteins within a patient also change over
time and in response to multiple external stimuli. The
traditional concept of one gene-one protein has given
way to the knowledge that one gene can produce a het-
erogeneous protein population that has multiple re-
lated structures with similar physiochemical properties
due to post-translational modification (phosphoryla-
tion, glycosylation, ubiquitination) at multiple sites
within a protein, or conformational changes resulting
from genetic polymorphisms (6, 7). In addition, the
quantity of protein produced can vary greatly based on
the individual in question or the patient/system envi-
ronment. These factors highlight the fact that compre-
hensive proteomics is shaping up to be much more
challenging than genomic analysis (8).
The typical approach to biochemical analyses is to
identify a system and develop an assay to monitor ac-
tivity within that system, followed by identification and
characterization of the protein component responsible
for the activity. However, advances in technology for
protein purification and identification have driven pro-
teomics research to a different approach where com-
prehensive protein databases for individual conditions
can be used to characterize individual patients and dis-
ease states (9) by studying systems biology. This ap-
proach does not begin with a specific activity or ques-
tion and is therefore not hypothesis driven (or the
hypotheses are very broad, i.e., “there will be a change
in protein expression between two patient states”),
which limits the need for analyte specific assays. While
the problem of developing analyte-specific assays is
negated through this broad new approach, there are
other obstacles that are equally challenging.
Standardization of conditions for analysis is a very
important concept in proteomic analysis and is neces-
sary for both intra- and inter-laboratory comparisons
due to the dynamic nature of the proteome. The Human
*E-mail of the corresponding author: dchan@jhmi.edu
Review
The Application of Clinical Proteomics to Cancer and other Diseases
Clarke et al.: Clinical proteomics 1563
Proteome Organization (HUPO), along with the Plasma
Proteome Project, has been formed to address this is-
sue as well as promote new research, increase aware-
ness of existing research, and foster cooperation be-
tween laboratories to address obstacles to proteomic
research (www.HUPO.org). Quantitation and character-
ization of all proteins in a sample is a difficult task, com-
plicated by the large dynamic range (~10000:1) of ex-
pressed proteins in a particular specimen (10) and the
diversity of protein expression in that specimen (11).
The large amount of data generated from this wide ar-
ray of expressed proteins, along with an extensive list
of potentially important observations, can lead to diffi-
culty in interpretation of results in a biological context
(9), demonstrating the need for increasingly powerful
bioinformatics tools. In addition, success in protein iso-
lation can vary dramatically depending on whether the
protein of interest is free or membrane bound, part of a
complex protein-protein interaction, or sequestered in
a specific cellular component (8). Several analytical
methodologies have been developed in response to
emerging challenges in proteomic analyses. Separa-
tion methods based on a wide range of physical or
chemical properties (e.g., solubility, pI, hydrophobicity)
have been employed to isolate sample proteins, in ad-
dition to development of tandem mass spectrometric
methods for protein identification. In addition, com-
puter engineers and biostatisticians are modifying in-
formatic techniques used for gene array analysis, as
well as developing new methods to sift through the vol-
umes of data produced during proteomic analyses.
Tools for Proteomic Analysis
2D-Electrophoresis
Traditionally, proteomic experiments have been per-
formed based on two-dimensional gel electrophoretic
analyses (12, 13). This is due in large part to its sizable
analyte capacity; in a standard format, 2-D gel elec-
trophoresis (2DE) can visualize 3000–10000 proteins,
depending on the method of spot detection (14). The
2-D gel approach separates proteins based on two dis-
tinct protein characteristics – size and charge. The pri-
mary assumption is that two proteins of the same or
similar mass will be unlikely to carry the same charge.
In the first dimension, the proteins are resolved from
one another by isoelectric focusing using a pH gradient
in either polyacrylamide or agarose gels. The sample
proteins migrate in an applied electric field until they
reach the area of the gradient where the pH is identical
to the pI of the protein. After the initial separation
based on pI, the proteins are transferred in SDS-con-
taining buffer to the second dimension cross-linked
polyacrylamide gel, where they will be further fraction-
ated based on their size. It is possible to look at sub-
populations of proteins, or fine tune the experiment, by
using limited pH gradients (e.g., pH 7–9) or adjusting
the cross-linking of the second dimension gel to opti-
mize resolution in a specific (more narrow) mass
range. Following separation, the proteins are com-
monly visualized by Coomassie or silver staining. For
higher sensitivity detection, fluorescent dyes such as
Sypro Ruby (Molecular Probes, Inc., Eugene, OR, USA)
or radioactive labeling can be used.
While 2DE is able to resolve a large number of pro-
teins, making it a very useful technique for proteomic
analyses, there are some significant drawbacks to its
use as well. Soluble proteins are readily analyzed and
identified using 2-D gel systems, but it is well known
that specific classes of proteins such as very acidic or
basic proteins, membrane proteins and very large or
small proteins can be excluded or underrepresented in
2-D gel patterns (3). In fact, it has been estimated that
50–75% of proteins on the gel, the low abundance pro-
teins, are not detected by staining (15). Co-migration of
proteins can also be a problem during 2-D gel separa-
tion. In addition, this technique is restricted by the lim-
ited solubility of hydrophobic and membrane proteins,
it has a narrow dynamic range, poor sensitivity, diffi-
culty in quantitation of proteins and it is not easily au-
tomated (14). However, in spite of all the difficulties,
2DE is still a vital component of the proteomics “tool-
box”. No other separation technique possesses its ex-
tremely large capacity coupled with the detailed infor-
mation on a protein, including its relative quantity, pI,
approximate molecular weight and solubility (16). The
automation of many pre-gel processes, implementa-
tion of various fractionation schemes, and its use as a
high capacity/high throughput front-end for mass
spectrometry have enabled it to retain its usefulness
despite limitations.
Mass spectrometry
Mass spectrometry (MS) is a technique that measures
the mass-to-charge ratio (m/z) of ions in the gas phase,
along with the number of ions present at each m/z
value. The mass spectrometer consists of two general
parts, sample introduction/ionization and the mass an-
alyzer/detector. MS is an important analytical tool be-
cause it can provide both quantitative and qualitative
information regarding an analyte of interest. The emer-
gence of biological MS in the late 1980’s using matrix-
assisted laser desorption ionization (MALDI) (17) and
electrospray ionization (ESI) (18) MS provided an im-
portant building block for advances in protein analysis
and proteomics. Coupled with advanced informatic
methodologies, MS is an increasingly important tool
for proteomic investigation.
These ionization techniques (MALDI and ESI) had
such a major impact in protein biochemistry because
they were able to generate ions in the gas phase with-
out extensive fragmentation from large proteins,
where previous ionization approaches were unable to
do so. MALDI produces ions by sublimating and ioniz-
ing the proteins out of a dry, crystalline matrix. Mass
spectra produced from this type of ionization are rela-
tively simple to interpret because primarily singly
charged ions are produced, so each peak corresponds
to an individual sample component. Unless it is cou-
1564 Clarke et al.: Clinical proteomics
pled with a separation method off-line (e.g., 2-D elec-
trophoresis), the MALDI approach is best for relatively
simple mixtures. A variant of MALDI, where the surface
of the MALDI target has been modified to mimic a chro-
matographic column, has enabled the laser desorption
based method to be used with more complex mixtures.
This technique is known as surface enhanced laser des-
orption ionization (SELDI) MS, and has been widely
used in cancer proteomics. This approach allows char-
acterization and fractionation of sample components
directly at the point of application, based on their dif-
ferential interaction with the surface (19, 20). The ESI
method of ionization produces ions from a solution,
which makes it amenable to coupling with liquid based
separation technologies such as high-performance liq-
uid chromatography (HPLC) or capillary electrophore-
sis (CE). Mass spectra obtained using ESI generally
contain multiply-charged analytes, so data handling
for this approach is more complex; however, the ability
to couple the ionization source directly with a separa-
tion method allows for analysis of more complex mix-
tures.
The mass analyzer component of a mass spectrome-
ter is equally important to proteomic analyses when
compared to sample ionization, because it will deter-
mine both the type and quality of data that is gener-
ated. In the context of proteomics, the key factors to
consider for a mass analyzer are sensitivity, resolution
and mass accuracy (21). The major types of mass ana-
lyzers used for proteomic studies include: quadrupole,
ion trap, time-of-flight (TOF) and Fourier transform ion
cyclotron (FT-MS) analyzers. Quadrupole and ion trap
analyzers operate using applied RF and DC voltage to
establish stable trajectories for ions to reach the detec-
tor. These analyzers are relatively inexpensive and ro-
bust, although they have a limited mass range for
which they are useful (upper limit m/z < 4000), seem-
ingly eliminating them from use for protein analysis.
However, these analyzers are able to be used for pro-
teins when coupled with ESI due to the multiply-
charged ions produced by that source – as z (charge) in-
creases, the m/z decreases to within the useful range of
these analyzers. A TOF analyzer is the simplest config-
uration; the time it takes for the ion (in a vacuum) to
reach the detector is proportional to the molecular
weight of the protein. It is also more sensitive and ac-
curate than the quadropole or ion trap analyzers. An
FT-MS instrument captures and analyzes ions in a high
vacuum using a high strength magnetic field, but its
high expense and technical complexity limit its use in
routine proteomic analyses.
Quadrupole and TOF analyzers can be combined in
various configurations to perform what is known as
tandem mass spectrometry (MS/MS). This technique
involves selection of precursor ions with the first ana-
lyzer, followed by induced fragmentation of the precur-
sor ion in a collision cell and analysis of the resultant
secondary fragments with a second mass analyzer.
Typical configurations for tandem MS include a triple
quadrupole instrument (QQQ), a quadrupole coupled
to a TOF analyzer (QTOF), or two TOF analyzers sepa-
rated by a collision cell (TOF-TOF). The ion trap and FT-
MS instruments are trapping analyzers and can per-
form MS/MS without incorporation of a second mass
analyzer; in fact, these instruments can perform multi-
ple iterations of fragmentation followed by mass analy-
sis to the n
th
degree (MS
n
). Although MS is finding use
as a screening technique for protein profiling (22–25),
its primary use is for identification of proteins of inter-
est (26, 27) and characterization of post-translational
modifications (28, 29).
Bioinformatics for Proteomic Analysis
The massive amount of data generated during pro-
teomic analysis experiments requires sophisticated in-
formatic methods to generate valid conclusions re-
garding the data. Proteome bioinformatics involves the
collection, storage, searching, analysis, classification,
management, archiving and retrieval of data in readily
accessible databases. A comprehensive and detailed
discussion of all currently available proteome infor-
matic tools is beyond the scope of this Review. One of
the primary applications of proteome informatics is to
identify proteins of interest from mass spectrometric
data in both expression and functional proteomic
analyses. This is accomplished by matching peptide
mass spectra to a sequence database following enzy-
matic digestion of the sample. Peptide mass finger-
printing (or peptide mass analysis) is accomplished by
comparison of peptide molecular weights with theoret-
ical masses of peptides derived from computer mod-
eled digestion of sequences in a target database with
the same enzyme (26). An alternative method for pro-
tein identification uses MS/MS and takes advantage of
the fact that precursor ions dissociate into fragment
ions along known pathways. When peptide MS/MS
data are used along with a partial peptide sequence
that is manually generated to search sequence data-
base, it is known as peptide sequencing or a peptide se-
quence tag query (30–32). A database search without a
manually generated peptide sequence is known as
MS/MS ion search analysis (32, 33).
For samples containing only one protein or simple
mixtures of a few proteins, peptide mass fingerprinting
is a suitable approach to identification. However, for
more complex mixtures of proteins, MS/MS is the ap-
proach that is most effective. It is important to note that
there are some significant obstacles to protein identifi-
cation through database searching. Due to these is-
sues, it is not uncommon to encounter a situation
where mass spectrometric data cannot be correlated
with any sequences from the database. These prob-
lems include, but are not limited to: absence of the
sequence in the database searched, the presence of un-
expected co-translational or post-translational modifi-
cation, chemical modification due to sample handling,
the presence of peptides produced by unexpected or
non-specific cleavage, unclear data due to poor quality
spectra or non-peptide contaminants (34). While not
perfect, these approaches are the common methods in
Clarke et al.: Clinical proteomics 1565
use at the present time; however, there is continuous
progress in advancing and improving the current tech-
niques.
Another important application of bioinformatics in
clinical proteomics is to identify proteins that are dif-
ferentially expressed in specimens associated with dif-
ferent disease states. For problems in which the total
number of proteins being investigated is relatively
small, established statistical approaches work well.
However, for research such as biomarker discovery
through proteomic profiling, the total number of pro-
teins being evaluated are typically much greater than
the number of clinical samples. Currently, there is not a
single standardized way to analyze such data. Over the
past several years, a number of methods and algo-
rithms have been developed for the differential analy-
sis of genomic expression data where similar problems
exist (35–37). These methods may also be applied to
the analysis of proteomic data. However, one has to
consider the wide dynamic ranges among proteins ex-
pression levels and the impact of non-disease-related
variability on protein expression patterns in the adop-
tion of such algorithms.
Unified Maximum Separability Analysis (UMSA) is a
supervised learning algorithm (38, 39) to derive a linear
or nonlinear classification function. UMSA modifies the
support vector machine algorithm to allow for the in-
corporation of data distribution information so that
even with a relatively small dataset, the result tends to
be robust when some of the samples are mislabeled (a
real possibility for clinical samples). The software pack-
age ProPeak from 3Z Informatics (LLC, Mt. Pleasant, SC,
USA) utilizes a linear version of UMSA in its three pro-
tein expression data analysis modules (23, 39, 40). In its
supervised “Component Analysis” module, the origi-
nal data are projected onto a three dimensional space in
which the groups of samples are best separated along
the individual axes. In the Bootstrap Selection Module,
the UMSA algorithm is used to rank individual variables
(proteins) according to its contribution (weights) in the
linear UMSA derived classification function. To reduce
the dependence of the findings on a particular composi-
tion of sample population, the bootstrap re-sampling
procedure was used to obtain multiple rankings of the
variables on randomly selected sub-populations of the
total available samples. The results were used to esti-
mate the average, median and standard deviation of
the multiple ranks. In general, a variable with a high av-
erage rank and small standard deviation in ranks is pre-
ferred for its informativeness and robustness across
multiple sub-populations. Finally, a third module uses
the backward stepwise selection procedure, commonly
used in multivariate logistic regression, replacing the
regression function with the UMSA classification func-
tion, to select differentially expressed variables.
An alternative approach using mass spectrometric
data has been taken, where, rather than use MS peaks
or spots as variables, each data point from a mass
spectral output is treated as a variable (24, 41). The
strategy in this approach is to identify a cluster of
points that will form a distinctive pattern indicative of
disease state. An algorithm based on the self-organiz-
ing map (42, 43) has been applied that uses the genetic
algorithm (44) to iteratively evaluate small sets of data
points corresponding to m/z values from a mass spec-
tra to find a pattern that will discriminate the disease
from non-disease samples in the training set (24). Al-
ternatively, each point in the spectrum can be com-
pared within the data set using non-parametric statis-
tics to determine those that have significant variance
between disease and non-disease, followed by a step-
wise discriminant analysis to develop rules for patient
diagnosis (41).
Clinical Applications of Proteomics – Cancer
Ovarian cancer
Early detection of ovarian cancer has the potential to
significantly reduce the mortality of the disease, how-
ever, currently available tumor markers for ovarian
cancer lack the sensitivity and specificity necessary to
be used for screening in large populations. Currently,
proteomic approaches are under development to ad-
dress this situation. A high-throughput method based
on SELDI-MS technology was used to screen for poten-
tial ovarian cancer biomarkers in serum (40). In this
study, plasma samples collected between 1998 and
2001 from patients with ovarian neoplasms (stages I-III)
were analyzed by SELDI-MS along with plasma con-
trols from women without neoplastic disease. Seven
potential biomarkers (of known molecular weight)
were identified based on their ability to discriminate
disease samples from control samples – this was later
narrowed to three, but none performed better than
CA-125 on an individual basis. However, a combination
of four biomarker candidates performed significantly
better than the individual markers and appeared to be
complementary to CA-125, thus improving sensitivity
when used in combination with the established ovarian
tumor marker.
An alternative SELDI-based approach has been pro-
posed for the diagnosis of ovarian cancer using char-
acteristic proteomic patterns rather than discrete bio-
markers of disease (24). This study collected serum
data from a “training set” of 50 patients with ovarian
cancer and 50 control patients, and then analyzed the
results using an iterative search algorithm designed to
detect proteomic patterns with the ability to discrimi-
nate cancer from non-cancer. The authors assert that a
cluster pattern was found with the ability to completely
segregate those with cancer from those without, and
when tested on an independent set of masked serum
samples consisting of patients with ovarian cancer, un-
affected women, and those with benign disease, the
test yielded a sensitivity of 100% and specificity of 95%.
However, this study was not received without some ob-
jections based on the difficulty of screening a low-risk
population (45–47) and the lack of protein/component
identification (48). An independent group performed
their own analysis of the SELDI data collected in the
1566 Clarke et al.: Clinical proteomics
original experiment using the Wilcoxon test and using
a combination of rules, reported they were able to
achieve 100% sensitivity and specificity in the first
training set (41).
Prostate cancer
The discovery of prostate specific antigen (PSA) as a
tumor marker has been an important factor in screen-
ing for prostate cancer (PCA) and management of PCA
patients, but a lack of specificity due to difficulty in dis-
tinguishing PCA from benign prostatic hyperplasia
(BPH) limits its utility for early diagnosis of cancer.
Mass spectrometric methods based on the SELDI plat-
form have been utilized to investigate alternatives
and/or compliments to PSA for use in the early detec-
tion of PCA. One such study utilized a pattern-matching
algorithm combined with SELDI data to develop a deci-
sion tree classification system for the disease (49). Data
was collected using serum from 167 PCA patients, 77
patients with BPH, and 82 unaffected control patients
that were matched by age for a training set. The train-
ing set was used to develop a non-protein mass pattern
that was able to correctly classify 96% of the samples.
When the algorithm was tested using a blinded test set,
specificity and sensitivity were determined to be 97%
and 83%, respectively. A separate study (50) used
SELDI-MS to generate data from 386 patients’ sera (98
late stage PCA, 99 early stage PCA, 93 BPH, 96 normal
controls), followed by reduction of the data to binary (x,
y) signals and bioinformatic analysis of those results.
This group found that they were easily able to differen-
tiate between the PCA/BPH group and the control pop-
ulations, but unable to differentiate PCA from BPH. The
authors suggest that this is due in part to the potential
presence of cancerous cells that go undetected in BPH
patients (51), which could lead to misclassification of
the patient samples.
Alternatively, proteomic analysis of PCA has been
performed using a 2DE methodology in an attempt to
find novel proteins that may serve as diagnostic mark-
ers (52). This study examined both normal and malig-
nant prostate tissue taken fresh from 24 radical prosta-
tectomy cases. The 2DE protein maps were analyzed to
categorize differentially expressed proteins, followed
by peptide mass fingerprinting and N-terminal se-
quencing for protein identification. The analysis was
able to identify 20 proteins that were lost from tissue in
malignant transformation, including several that had
been previously reported including PSA, α
1
-antichy-
motrypsin (ACT), haptoglobin and lactoylglutathione
lyase. In addition, three proteins were found to de-
crease that had not been previously identified, includ-
ing ubiquitin-like NEDD8, calponin, and a follistatin-re-
lated protein.
Breast cancer
Several genes or gene products including HER2/neu,
BRCA, estrogen receptor and p53 have been shown to
have a predictive role in breast cancer, but none of them
to date have shown utility for the early detection of
breast cancer. Proteomic analyses are being employed
in an effort to discover new biomarkers that will lead to
early detection of breast cancer and minimize disease-
related mortality. One such study has been performed
by Li et al. using SELDI-MS (23). In this study, serum
samples from 169 patients were analyzed by SELDI-MS,
including 103 breast cancer patients at different clinical
stages (stage 0=4, stage I = 38, stage II = 37, stage III =
24), 41 healthy patients, and 25 patients with benign
breast disease. The results were analyzed using the
ProPeak statistical software package, as well as multi-
variate logistical regression. This group was able to de-
fine three markers for breast cancer based on molecular
weight which, when combined, demonstrated a sensi-
tivity and specificity of 93% and 91%, respectively, for
all cancer patients (stages 0-III). Additional studies
based on MS have been performed using nipple aspi-
rate fluid (NAF) as an alternative matrix (53–55) due to
the fact that breast specific proteins are generally more
concentrated in NAF than blood and because mam-
mary ductal cells are the origin for 70–80% of breast
cancers. Two of the studies were performed using
SELDI-MS technology (53, 54), while the others was
performed using a combination of 2DE and MS for pro-
tein characterization (55). These studies were able to
demonstrate breast tumor marker candidates based on
the results from small patient populations.
In contrast to those biomarker studies performed by
analysis of body fluids, additional studies are per-
formed using tissue obtained by biopsy, or dissection
of tissue removed from the patient. One such study has
investigated the transition of normal epithelium to
early stage cancer by proteomic analysis of matched
cancer and non-cancer cells removed from the same
tissue section using laser capture microdissection
(LCM) (56). Samples consisting of up to 100000 cells
captured by LCM were analyzed by 2DE, with differen-
tial protein expression confirmed by image analysis of
the gels as well as MS sequencing of the proteins of in-
terest. There were 57 proteins of interest identified, but
only 10 of them appeared in more than one ductal car-
cinoma sample. The group concludes that while the co-
hort size was not large enough for significant statistical
analysis, the study provides an indication that the in-
formation gained is unique compared to that produced
by nucleic acid-based technologies. Another approach,
as described in a review by Hondermarck et al. (57),
studies the signaling pathway for breast cancer using
proteomic analysis of breast tissue from diseased and
non-diseased patients for discovery and development
of new tumor markers, as well as identification of po-
tential therapeutic targets and potential strategies for
management of breast cancer.
Clinical Applications of Proteomics to other Diseases
Solid organ transplantation
Rejection constitutes the major impediment to success
of solid organ transplantation, but currently there is no
Clarke et al.: Clinical proteomics 1567
clear cut method for diagnosis of rejection aside from
biopsy, which is invasive and has the potential to cause
significant patient morbidity. Proteomic screening of
blood and other body fluids for biomarker discovery
has the potential to drastically change the way that
transplant patients are managed. One such study has
been performed using proteomic screening of urine
from renal transplant patients to search for biomarkers
of transplant rejection (22). In this study, 34 urine sam-
ples were collected from 32 renal transplant patients at
various stages posttransplantation (17 with acute re-
jection, 15 with no rejection). Specimens were ana-
lyzed by SELDI-MS, with the data analyzed by the Pro-
Peak statistical software package to identify potential
biomarker candidates. The candidates were then
ranked by their ability to differentiate between rejec-
tion and non-rejection as defined by the area under a
receiver operating characteristic (ROC) curve (AUC).
The five best biomarker candidates exhibited AUC val-
ues ranging from 0.804 to 0.839, while a combination of
two other biomarker candidates using a classification
and regression tree algorithm correctly classified 91%
of the sample set, giving a sensitivity and specificity of
83% and 100%, respectively.
Additional work investigating the effects of cy-
closporine A (CsA) toxicity through proteomic analysis
of kidney tissue has been performed using 2DE (58).
CsA is a commonly used immunosuppressant for post-
transplant patients that carries with it significant risk for
nephrotoxicity. This study uses 2DE analysis to com-
pare protein profiles of tissues from various species in
the presence of toxic levels of CsA. In dogs and mon-
keys, which generally do not exhibit CsA toxicity, there
was no change in renal calbindin levels; however, in
CsA-treated kidneys that exhibited drug-related toxic-
ity, there was a marked decrease in renal calbindin lev-
els in most of the biopsy section. The authors posit that
based on this evidence, renal calbindin is a potential
biomarker for CsA toxicity in transplant patients.
Cardiovascular disease and acute coronary
syndromes
Proteomics in cardiovascular medicine has the poten-
tial to revolutionize the way a cardiologist is able to di-
agnose disease, assess risk, determine prognosis and
target therapeutic strategies among patients with car-
diovascular disease. Early markers of coronary artery
disease or acute coronary syndromes can yield earlier
intervention and improved patient outcomes, while an
enhanced understanding of the mechanism of cardio-
vascular disease may lead to improved treatment prac-
tices. Excellent studies are already underway to opti-
mize the amount of protein extracted and quality of
analysis for cardiovascular proteomics (59, 60). A pro-
teomic study of coronary artery disease (CAD) exam-
ined coronary arteries from 10 patients with CAD and
seven from normal individuals (61). Protein was ex-
tracted from the tissue and analyzed comparatively by
2DE, followed by protein identification for spots of in-
terest with MS. The results indicated an increased ex-
pression of ferritin light chain in the diseased arteries,
which was confirmed by Western blot and quantitative
real-time PCR analysis.
Another interesting application of proteomics to car-
diovascular research is the characterization of protein
expression in the heart during cardiac hypertrophy.
One particular study examined differences in protein
expression between cardiac myocytes with phenyle-
phrine-induced hypertrophy and normal cardiac my-
ocytes (62). In this study, neonatal myocytes from rats
were purified and cultured prior to analysis. Hypertro-
phy was induced in the disease group of cells by addi-
tion of phenylephrine 48 hours prior to cell lysis. After
cell lysis, proteins from both groups of cells were ana-
lyzed by 2DE and MS. The authors were able to identify
eight proteins found to differ in expression level be-
tween the hypertrophy-induced cell population and the
control cell population. A similar study was performed
by Macri et al. (63) where hypertrophy was induced us-
ing endothelin rather than phenylephrine. This study
demonstrated that endothelin-induced hypertrophy
was accompanied by a 30% increase in both myosin
light chains 1 and 2.
Neuromuscular disease and CNS disorders
The application of proteomics to neuroscience has the
potential to make a large impact in the understanding
of pharmacodynamic effects of drugs in the brain, the
progression of neurodegenerative and neuromuscular
disease, the process of complex neuropsychiatric dis-
orders, as well as the identification of potential new
drug targets. A giant step in this direction as been the
generation of a partial protein map for the human brain
using 2-D gels and MS (64). This project has been able
to identify over 300 proteins and protein fragments of a
variety of types such as enzymes, cytoskeleton pro-
teins, signaling proteins, chaperones, glial-associated
proteins, apoptosis-related proteins and signaling pro-
teins from brain tissue. Another study has undertaken
the goal of cataloging the proteome of a quadriceps
muscle using 2DE with an eye toward the study of neu-
romuscular disease (65). Application of proteomics to a
specific CNS disorder has been demonstrated by a
group using proteomic characterization of cere-
brospinal fluid (CSF) to discover potential molecular
markers for Alzheimer’s disease (66). CSF was col-
lected from 10 patients clinically diagnosed with
Alzheimer’s (later confirmed by post-mortem pathol-
ogy) and five patients that were used as normal con-
trols. The mean age for the disease population was
75.4 years, while the mean age for the controls was
42.3 years. Protein from the CSF was isolated and sub-
jected to 2DE and image analysis, followed by data
analysis by a heuristic clustering algorithm. The au-
thors assert that nine unique protein spots were identi-
fied that were able to distinguish between normal and
Alzheimer’s CSF based on their algorithm.
1568 Clarke et al.: Clinical proteomics
Conclusion
While a majority of proteomics work has been in the
field of cancer research, it has been applied to or con-
sidered for many other clinical research situations in-
cluding management of atherosclerosis (67), infertility
studies (68), platelet studies (69), and characterization
of hematopoietic stem cell-like entities in the context of
leukemia (70). Drug discovery and development
groups are using proteomics to identify potential drug
targets (71, 72) as well as characterize pharmacody-
namic effects (73–75); proteomics may even help to
detect and avoid adverse drug reactions (76). Pro-
teomics technologies are also finding a place in nutri-
tion research (77, 78). Investigation using proteomics is
underway to study microbial infection (79, 80) – a field
referred to as infectomics – and even to identify novel
components of specialized human tissue such as cilia
(81) or dental tissues (82).
While proteomics technologies certainly serve as use-
ful tools for clinical investigation, it is important to re-
member that proper study design is the key to any kind
of success derived from using these tools. It is possible
that systemic errors can be introduced in the study that
will artifactually discriminate disease from non-disease
including differences in sample collection (e.g., arterial
vs. venous blood; plasma vs. serum) or using non-
matched patient groups (e.g., everyone in the disease
group is over 60 years and everyone in the control group
is under 40 years). Possible diurnal variation of protein
expression must be accounted for, so the time of sample
collection should be controlled as closely as possible. In
addition, it is important that patients with benign condi-
tions, or with disease not related to the clinical condition
of interest be included in the control group so that the
study is truly examining differences between those with
a specific disease condition and those without, rather
than measuring differences between generally sick and
generally healthy patients. These types of details can be
difficult to manage, particularly in a retrospective study,
but careful attention to study design will yield a greater
chance of success for proteomic comparisons.
The scope of the proteomics field is such that not
every study could be addressed properly or adequately
in this Review. A comprehensive review of all clinically
relevant studies would be voluminous. In the past year,
there were over 900 citations in PubMed using the
search term “proteomics”. Refining the search to in-
clude only studies found searching for “clinical pro-
teomics” revealed 220 citations over the past 5 years.
We expect the number of new studies to increase ex-
ponentially, with many new and exciting clinical appli-
cations to be derived from these studies.
References
1. Huber LA. Is proteomics heading in the wrong direction?
Nat Rev Mol Cell Biol 2003; 4:74–80.
2. Pennington SR, Wilkins MR, Hochstrasser DF, Dunn MJ.
Trends Cell Biol 1997; 7:168.
3. Peng J, Gygi SP. Proteomics: the move to mixtures. J Mass
Spectrom 2001; 36:1083–91.
4. Rai AJ, Chan DW. Clinical proteomics: new developments
in clinical chemistry. Lab Med 2001; 25:399–403.
5. Palmer-Toy DE, Kuzdzal S, Chan DW. Proteomic approaches
to tumor marker discovery. In: Diamandis EP, Fritsche HA,
Lilja H, Chan DW, Schwartz MK, editors. Tumor markers:
physiology, pathobiology, technology, and clinical applica-
tions. Washington, DC: AACC Press, 2002:391–400.
6. Hancock W, Apffel A, Chakel J, Hahnenberger K, Choud-
hary G, Traina JA, et al. Integrated genomic/proteomic
analysis. Anal Chem 1999; 71:742A–748A.
7. Zhou H, Watts JD, Aebersold R. A systematic approach to
the analysis of protein phosphorylation. Nat Biotechnol
2001; 19:375–8.
8. Hancock WS, Wu SL, Shieh P. The challenges of develop-
ing a sound proteomics strategy. Proteomics 2002; 2:352–
9.
9. Aebersold R. A mass spectrometric journey into protein
and proteome research. J Am Soc Mass Spectrom 2003;
14:685–95.
10. Corthals GL, Wasinger VC, Hochstrasser DF, Sanchez
JC. The dynamic range of protein expression: a challenge
for proteomic research. Electrophoresis 2000; 21:1104–
15.
11. Harry JL, Wilkins MR, Herbert BR, Packer NH, Gooley AA,
Williams KL. Proteomics: capacity versus utility. Elec-
trophoresis 2000; 21:1071–81.
12. O’Farrell PH. High resolution two-dimensional elec-
trophoresis of proteins. J Biol Chem 1975; 250:4007–21.
13. Klose J. Protein mapping by combined isoelectric focusing
and electrophoresis of mouse tissues. A novel approach to
testing for induced point mutations in mammals. Human-
genetik 1975; 26:231–43.
14. Issaq HJ. The role of separation science in proteomics re-
search. Electrophoresis 2001; 22:3629–38.
15. Gygi SP, Aebersold R. Mass spectrometry and proteomics.
Curr Opin Chem Biol 2000; 4:489–94.
16. Lopez MF. Better approaches to finding the needle in a
haystack: optimizing proteome analysis through automa-
tion. Electrophoresis 2000; 21:1082–93.
17. Karas M, Hillenkamp F. Laser desorption ionization of pro-
teins with molecular masses exceeding 10,000 daltons.
Anal Chem 1988; 60:2299–301.
18. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM.
Electrospray ionization for mass spectrometry of large bio-
molecules. Science 1989; 246:64–71.
19. Merchant M, Weinberger SR. Recent advancements in sur-
face-enhanced laser desorption/ionization-time of flight-
mass spectrometry. Electrophoresis 2000; 21:1164–77.
20. Issaq HJ, Veenstra TD, Conrads TP, Felschow D. The SELDI-
TOF MS approach to proteomics: protein profiling and bio-
marker identification. Biochem Biophys Res Commun
2002; 292:587–92.
21. Aebersold R, Mann M. Mass spectrometry-based pro-
teomics. Nature 2003; 422:198–207.
22. Clarke W, Silverman BC, Zhang Z, Chan DW, Klein AS, Mol-
menti EP. Characterization of renal allograft rejection by
urinary proteomic analysis. Ann Surg 2003; 237:660–4;
discussion 664–5.
23. Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Pro-
teomics and bioinformatics approaches for identification
of serum biomarkers to detect breast cancer. Clin Chem
2002; 48:1296–304.
24. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA,
Steinberg SM, et al. Use of proteomic patterns in serum to
identify ovarian cancer. Lancet 2002; 359:572–7.
Clarke et al.: Clinical proteomics 1569
25. Chakraborty A, Regnier FE. Global internal standard tech-
nology for comparative proteomics. J Chromatogr A 2002;
949:173–84.
26. Cottrell JS. Protein identification by peptide mass finger-
printing. Pept Res 1994; 7:115–24.
27. Gevaert K, Vandekerckhove J. Protein identification meth-
ods in proteomics. Electrophoresis 2000; 21:1145–54.
28. Larsen MR, Roepstorff P. Mass spectrometric identification
of proteins and characterization of their post-translational
modifications in proteome analysis. Fresenius J Anal
Chem 2000; 366:677–90.
29. Xiong L, Regnier FE. Use of a lectin affinity selector in the
search for unusual glycosylation in proteomics. J Chro-
matogr B Analyt Technol Biomed Life Sci 2002; 782:405–18.
30. Mann M, Wilm M. Error-tolerant identification of peptides
in sequence databases by peptide sequence tags. Anal
Chem 1994; 66:4390–9.
31. Fenyo D, Qin J, Chait BT. Protein identification using mass
spectrometric information. Electrophoresis 1998; 19:998–
1005.
32. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-
based protein identification by searching sequence data-
bases using mass spectrometry data. Electrophoresis
1999; 20:3551–67.
33. Yates JR 3rd. Database searching using mass spectrome-
try data. Electrophoresis 1998; 19:893–900.
34. Chakravarti DN, Chakravarti B, Moutsatsos I. Informatic
tools for proteome profiling. Biotechniques 2002; Suppl:
4–10, 12–5.
35. Ito T, Sakaki Y. Toward genome-wide scanning of gene ex-
pression: a functional aspect of the Genome Project. Es-
says Biochem 1996; 31:11–21.
36. Going JJ, Gusterson BA. Molecular pathology and future
developments. Eur J Cancer 1999; 35:1895–904.
37. Wallrapp C, Muller-Pillasch F, Micha A, Wenger C, Geng M,
Solinas-Toldo S, et al. Strategies for the detection of dis-
ease genes in pancreatic cancer. Ann N Y Acad Sci 1999;
880:122–46.
38. Vapnik VN. Statistical learning theory. New York: John Wi-
ley & Sons, 1998:401–440.
39. Zhang Z, Page G, Zhang H. Applying classification separa-
bility analysis to microarray data. In: Lin SM JK, editor.
Method of microarray data analysis: papers from CAMDA
‘00. Boston: Kluwer Academic Publishers, 2001:25–26.
40. Rai AJ, Zhang Z, Rosenzweig J, Shih Ie M, Pham T, Fung
ET, et al. Proteomic approaches to tumor marker discov-
ery. Arch Pathol Lab Med 2002; 126:1518–26.
41. Sorace JM, Zhan M. A data review and re-assessment of
ovarian cancer serum proteomic profiling. BMC Bioinfor-
matics 2003; 4:24.
42. Holdaway RM, White MW. Computational neural net-
works: enhancing supervised learning algorithms via self-
organization. Int J Biomed Comput 1990; 25:151–67.
43. Erwin E, Obermayer K, Schulten K. Self-organizing maps:
stationary states, metastability and convergence rate. Biol
Cybern 1992; 67:35–45.
44. Holland JH, editor. Adaptation in natural and artificial sys-
tems: an introductory analysis with applications to biol-
ogy, control, and artificial intelligence, 3rd ed. Cambridge,
MA: MIT Press, 1994.
45. Rockhill B. Proteomic patterns in serum and identification of
ovarian cancer. Lancet 2002; 360:169; author reply 170–1.
46. Elwood M. Proteomic patterns in serum and identification
of ovarian cancer. Lancet 2002; 360:170; author reply 170–1.
47. Pearl DC. Proteomic patterns in serum and identification of
ovarian cancer. Lancet 2002; 360:169–70; author reply
170–1.
48. Diamandis EP. Proteomic patterns in serum and identifica-
tion of ovarian cancer. Lancet 2002; 360:170; author reply
170–1.
49. Adam BL, Qu Y, Davis JW, Ward MD, Clements MA,
Cazares LH, et al. Serum protein fingerprinting coupled
with a pattern-matching algorithm distinguishes prostate
cancer from benign prostate hyperplasia and healthy men.
Cancer Res 2002; 62:3609–14.
50. Yasui Y, Pepe M, Thompson ML, Adam BL, Wright GL Jr,
Qu Y, et al. A data-analytic strategy for protein biomarker
discovery: profiling of high-dimensional proteomic data
for cancer detection. Biostatistics 2003; 4:449–63.
51. Djavan B, Mazal P, Zlotta A, Wammack R, Ravery V, Remzi
M, et al. Pathological features of prostate cancer detected
on initial and repeat prostate biopsy: results of the
prospective European Prostate Cancer Detection study.
Prostate 2001; 47:111–7.
52. Meehan KL, Holland JW, Dawkins HJ. Proteomic analysis
of normal and malignant prostate tissue to identify novel
proteins lost in cancer. Prostate 2002; 50:54–63.
53. Sauter ER, Zhu W, Fan XJ, Wassell RP, Chervoneva I, Du
Bois GC. Proteomic analysis of nipple aspirate fluid to de-
tect biologic markers of breast cancer. Br J Cancer 2002;
86:1440–3.
54. Paweletz CP, Trock B, Pennanen M, Tsangaris T, Magnant C,
Liotta LA, et al. Proteomic patterns of nipple aspirate fluids
obtained by SELDI-TOF: potential for new biomarkers to
aid in the diagnosis of breast cancer. Dis Markers 2001;
17:301–7.
55. Varnum SM, Covington CC, Woodbury RL, Petritis K, Kan-
gas LJ, Abdullah MS, et al. Proteomic characterization of
nipple aspirate fluid: identification of potential biomarkers
of breast cancer. Breast Cancer Res Treat 2003; 80:87–97.
56. Wulfkuhle JD, Sgroi DC, Krutzsch H, McLean K, McGarvey
K, Knowlton M, et al. Proteomics of human breast ductal
carcinoma in situ. Cancer Res 2002; 62:6740–9.
57. Hondermarck H, Dolle L, El Yazidi-Belkoura I, Vercoutter-
Edouart AS, Adriaenssens E, Lemoine J. Functional pro-
teomics of breast cancer for signal pathway profiling and
target discovery. J Mammary Gland Biol Neoplasia 2002;
7:395–405.
58. Aicher L, Wahl D, Arce A, Grenet O, Steiner S. New insights
into cyclosporine A nephrotoxicity by proteome analysis.
Electrophoresis 1998; 19:1998–2003.
59. Stanley BA, Neverova I, Brown HA, Van Eyk JE. Optimizing
protein solubility for two-dimensional gel electrophoresis
analysis of human myocardium. Proteomics 2003; 3:815–
20.
60. McDonough JL, Neverova I, Van Eyk JE. Proteomic analy-
sis of human biopsy samples by single two-dimensional
electrophoresis: Coomassie, silver, mass spectrometry,
and Western blotting. Proteomics 2002; 2:978–87.
61. You SA, Archacki SR, Angheloiu G, Moravec CS, Rao S,
Kinter M, et al. Proteomic approach to coronary athero-
sclerosis shows ferritin light chain as a significant marker:
evidence consistent with iron hypothesis in atherosclero-
sis. Physiol Genomics 2003; 13:25–30.
62. Arnott D, O’Connell KL, King KL, Stults JT. An integrated
approach to proteome analysis: identification of proteins
associated with cardiac hypertrophy. Anal Biochem 1998;
258:1–18.
63. Macri J, Rapundalo ST. Application of proteomics to the
study of cardiovascular biology. Trends Cardiovasc Med
2001; 11:66–75.
64. Lubec G, Krapfenbauer K, Fountoulakis M. Proteomics in
brain research: potentials and limitations. Prog Neurobiol
2003; 69:193–211.
1570 Clarke et al.: Clinical proteomics
65. van den Heuvel LP, Farhoud MH, Wevers RA, van Engelen
BG, Smeitink JA. Proteomics and neuromuscular dis-
eases: theoretical concept and first results. Ann Clin
Biochem 2003; 40:9–15.
66. Choe LH, Dutt MJ, Relkin N, Lee KH. Studies of potential
cerebrospinal fluid molecular markers for Alzheimer’s dis-
ease. Electrophoresis 2002; 23:2247–51.
67. Ballantyne CM. Newer risk markers and surrogate end-
points in atherosclerosis management. Clin Cardiol 2001;
24:III13–7.
68. Bohring C, Krause W. Immune infertility: towards a better
understanding of sperm (auto)-immunity: the value of pro-
teomic analysis. Hum Reprod 2003; 18:915–24.
69. Maguire PB, Fitzgerald DJ. Platelet proteomics. J Thromb
Haemost 2003; 1:1593–601.
70. Ota J, Yamashita Y, Okawa K, Kisanuki H, Fujiwara S,
Ishikawa M, et al. Proteomic analysis of hematopoietic
stem cell-like fractions in leukemic disorders. Oncogene
2003; 22:5720–8.
71. Frank R, Hargreaves R. Clinical biomarkers in drug discov-
ery and development. Nat Rev Drug Discov 2003; 2:566–
80.
72. Colburn WA. Biomarkers in drug discovery and develop-
ment: from target identification through drug marketing. J
Clin Pharmacol 2003; 43:329–41.
73. Bruneau JM, Maillet I, Tagat E, Legrand R, Supatto F, Fudali
C, et al. Drug induced proteome changes in Candida albi-
cans: comparison of the effect of beta(1,3) glucan synthase
inhibitors and two triazoles, fluconazole and itraconazole.
Proteomics 2003; 3:325–36.
74. Chen ST, Pan TL, Tsai YC, Huang CM. Proteomics reveals
protein profile changes in doxorubicin-treated MCF-7 hu-
man breast cancer cells. Cancer Lett 2002; 181:95–107.
75. Witzmann FA, Grant RA. Pharmacoproteomics in drug de-
velopment. Pharmacogenomics J 2003; 3:69–76.
76. Wilkins MR. What do we want from proteomics in the de-
tection and avoidance of adverse drug reactions. Toxicol
Lett 2002; 127:245–9.
77. Daniel H. Genomics and proteomics: importance for the
future of nutrition research. Br J Nutr 2002; 87 Suppl 2:
S305–11.
78. van Ommen B, Stierum R. Nutrigenomics: exploiting sys-
tems biology in the nutrition and health arena. Curr Opin
Biotechnol 2002; 13:517–21.
79. O’Connor CD, Adams P, Alefounder P, Farris M, Kinsella N,
Li Y, et al. The analysis of microbial proteomes: strategies
and data exploitation. Electrophoresis 2000; 21:1178–86.
80. Huang SH, Triche T, Jong AY. Infectomics: genomics and
proteomics of microbial infections. Funct Integr Genomics
2002; 1:331–44.
81. Ostrowski LE, Blackburn K, Radde KM, Moyer MB,
Schlatzer DM, Moseley A, et al. A proteomic analysis of hu-
man cilia: identification of novel components. Mol Cell
Proteomics 2002; 1:451–65.
82. Hubbard MJ, Kon JC. Proteomic analysis of dental tissues.
J Chromatogr B Analyt Technol Biomed Life Sci 2002;
771:211–20.
Received 23 October 2003, revised 28 October 2003,
accepted 6 November 2003
Corresponding author: Prof. Daniel W. Chan, Johns Hopkins
School of Medicine, Department of Pathology, 600 N. Wolfe
Street/Meyer B-125, Baltimore, MD 21287, USA
Phone: +1 410-955-6304, Fax: +1 410-955-0767,
E-mail: dchan@jhmi.edu