Article

Global Survey of Organ and Organelle Protein Expression in Mouse: Combined Proteomic and Transcriptomic Profiling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Organs and organelles represent core biological systems in mammals, but the diversity in protein composition remains unclear. Here, we combine subcellular fractionation with exhaustive tandem mass spectrometry-based shotgun sequencing to examine the protein content of four major organellar compartments (cytosol, membranes [microsomes], mitochondria, and nuclei) in six organs (brain, heart, kidney, liver, lung, and placenta) of the laboratory mouse, Mus musculus. Using rigorous statistical filtering and machine-learning methods, the subcellular localization of 3274 of the 4768 proteins identified was determined with high confidence, including 1503 previously uncharacterized factors, while tissue selectivity was evaluated by comparison to previously reported mRNA expression patterns. This molecular compendium, fully accessible via a searchable web-browser interface, serves as a reliable reference of the expressed tissue and organelle proteomes of a leading model mammal.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... CREB is a transcriptional regulator of BDNF (45)(46)(47) . A drop in serum BDNF levels during the initial stages of SSRI treatment may be related to a later SSRI response in adolescents with severe depressive disorders, according to a different investigation on the relationship between BDNF levels and SSRIs (48) . There is a correlation between BDNF levels and depression in adolescents. ...
... Granisetron appears to have an antidepressant effect per se in animal research, at least at modest dosages. Whereas granisetron, a serotonin 5-HT3 receptor antagonist, counteracts the side effects of SSRIs on the digestive system (48) . Tricyclic antidepressants like imipramine, desipramine, and doxepin, monoamine oxidase inhibitors (MAOIs) like phenelzine, and SSRIs like fluoxetine have been demonstrated to uncompetitively decrease 5-HT3 transreceptor currents in previous investigations. ...
... Antidepressants may act like 5-HT3 receptor antagonists by inhibiting the 5-HT3 receptor. Another example of granisetron's potential antidepressant impact (48,(66)(67)(68)(69) . ...
Article
Full-text available
Background: The development of neurotoxicity in healthy, non-targeted brain tissue exposed to radiation during cranial radiotherapy (RT) is the most frequent event of radiation-induced adverse effects. The 5-hydroxytryptamine-3 (5-HT3) receptor antagonists may also have a range of neuroprotective, anti-inflammatory, and antiphlogistic properties in addition to their anti-emetic effects. Materials and Methods: Study groups were formed in the following ways: Group 2: Irradiation (IR)- only (IR+Saline); Group 1: Normal control (orally fed control); Group 3: IR+Granisetron (IR+Granisetron): whole-brain IR and Granisetron 1 mg/kg/day (Merck) administered orally. 15 days of all therapies were given. The 15 days were completed with behavioral testing. In the entire brain IR-only (placebo) group, a substantial deterioration was seen in all studied marker levels and behavioral test results. Results: Compared to the IR-only group, all of these biochemical indicators significantly improved in the granisetron group (IR+Granisetron), and levels of the control group returned to normal. In behavioral test analyses, a substantial decline in the open field and passive avoidance learning social recognition tests was seen in the IR-only group compared to the healthy control group, whereas an improvement was seen in the IR+Granisetron group. In addition, the IR-only group showed a reduction in hippocampus neurons and Purkinje neurons as well as an increase in hippocampal gliosis, whereas the IR+Granisetron group showed an improvement and a return to the normal control group counts. Conclusion: In summary, we discovered that granisetron had neuroprotective properties in a rat model of radiation-induced brain damage.
... In research, increasing the dynamic range of protein detection measurements is imperative if protein expression levels of a single sample or cell have to be set in context of each other. 8 There splitting the sample for multiple measurements is impossible and correlating results from different measurements may impede the accuracy. The dynamic range of protein expression inside cells has been estimated to span up to 12 magnitudes. ...
... Figure 1. 8 shows the principle setup. A couple of different methods are used to immobilize the biomolecules on the metal surface. ...
... The Spearman correlation for all 20 proteins and the corresponding transcripts in each cluster can be seen in Figure 3. 20. Most clusters have correlations above 0. 8. Cluster 8 shows a low correlation between the two datasets, however when comparing with Figure 3.17 this cluster seemed to be small and contained only few cells. ...
Thesis
Als fundamentale Bausteine jedes Organismus stellen Proteine seit langem eines der wichtigsten Studienobjekte in der Biologie dar. Je nach Kontext und Anforderung der Analyse stehen Wissenschaftlern eine Reihe von Methoden zur Verfügung. Während die Immunofluoreszenz eine Lokalisation einzelner Proteine \textit{in situ} erlaubt, ermöglichen die Co-Immunopräzipitation oder die Oberflächen-Plasmonenre- sonanz eine genauere Quantifizierung der Proteine, sowie deren Interaktionen mit anderen Proteinen. Die PLA-Methode (\textit{proximity ligation assay}) vereinigt beide dieser Vorzüge indem sie durch die Bildung fluoreszenter DNA-Punkte innerhalb der Zelle eine \textit{in situ} Quantifizierung einzelner Proteine sowie von Protein-Interaktionen möglich macht. \\ In Kapitel 3.1.1 (\textit{The Bat Influenza Entry Path into the Human Cell}) werden die Möglichkeiten der PLA anhand des kürzlich entdeckten Fledermaus-Influenzavirus H18N11 demonstriert: Die Interaktion zwischen dem menschlichen Zelloberflächenprotein HLA-DR und dem Fledermaus-Haemagglutinin H18 können erfolgreich gezeigt werden. Dies legt nahe, dass H18N11 über die Interaktion mit HLA-DR auch menschliche Zellen infizieren kann. Im Rahmen des Projekts wird auch die Möglichkeit einer KI-basierten PLA-Bildauswertung untersucht. Verschiedene Klassifizierungsmethoden werden dabei miteinander, sowie mit der klassischen algorithmischen Auswertung verglichen. Hierbei zeigt sich, dass zwar die KI-basierte Auswertung schwache und gruppierte PLA-Punkte besser erkennt, sie jedoch deutlich mehr Zeit benötigt als der algorithmische Ansatz. Weitere Optimierungen sind nötig um die KI-basierte Auswertung in dieser Hinsicht kompetetiv zu machen. \\ Das Projekt \textit{Interactions Of The Insulin Inhibitory Receptor} in Kapitel 3.1.2 wendet die PLA-Methode zur Identifizierung von Interaktionspartnern des neuentdeckten Rezeptors Inceptor an. Die Studie zeigt, dass das Protein mit sich selbst (Homodimerisierung) und mit IRS2 interagiert und ein Heterodimer mit dem Insulinrezeptor bildet. Dieses Heterodimer findet sich überwiegend in räumlicher Nähe zum endoplasmatischen Retikulum. Die Beobachtung legt nahe, dass Inceptor ein Internalisierungssignal für den Insulinrezeptor darstellt, welches zum Abbau und dadurch zur Desensibiliserung der Zelle für Insulin führt. Darüberhinaus zeigt die Studie wie die PLA-Methode auch zur Untersuchung von Protein- und Protein-Interaktionsdynamiken genutzt werden kann. Eine Zeitreihe mit Insulin- und Glukose-Stimulationen zeigt Verschiebungen der PLA-Punkte verschiedener Interaktionen innerhalb der Zelle und erlaubt dadurch Einblicke in die Internalisierungsmechanismen von Inceptor.\\ In Kapitel 3.1.3 (\textit{Blocking-PLA}) wird das Funktionsprinzip der PLA dazu genutzt um submolekulare Informationen über Interaktionen im Rahmen einer Epitopkartierung zu gewinnen: Das Homodimer von Inceptor wird mit jeweils einem von zwölf verschiedenen monoklonalen Antikörpern inkubiert und analysiert. Ein reduziertes PLA-Signal indiziert anschließend eine Blockade der Interaktionsdomäne. Drei Antikörper werden gefunden, die das Signal um mehr als 40\% reduzieren. Diese Antikörper können in zukünftigen Experimenten potenziell dazu verwendet werden die Internalisierung des Insulinrezeptors zu beeinflussen. \\ Kapitel 3.1.4 (Cite-Seq) wendet das zugrundeliegende Prinzip der PLA - den DNA-konjugierten Antikörper - an, um mit der Cite-Seq-Methode Proteine in Einzelzellen zu identifizerien. Die Methode basiert auf der Einzelzell-RNA Sequenzierung, bei der innerhalb eines Tröpfchen-Mikrofluidik-Aufbaus die Nukleinsäuren von Einzelzellen an Mikrokugeln abgegeben werden, welche die Weiterverarbeitung der Nukleinsäuren und die Vorbereitung einer DNA-Sequenzierung ermöglichen. Indem Proteine mit speziellen, bekannten DNA-Strängen versehen werden eröffnet sich diese Nukleinsäure-basierte Methode auch für die Welt der Proteomik. Das Projekt demonstriert, wie 20 Proteine gleichzeitig erfolgreich in einzelnen Zellen quantifiziert werden können. Bioinformatische Methoden werden anschließend verwendet um Rückschlüsse auf die Zellidentität einzelner Zellen innerhalb einer großen Population zu ziehen. \\ Die Konjugation von Proteinen mit DNA bildet die Voraussetzung für die Durchführung von PLA, Cite-Seq und weiteren Methoden. Zur Sicherung hochwertiger Ergebnisse und Senkung der Experimentkosten ist eine effiziente Konjugation daher unabdinglich. In Kapitel 3.2.1 \textit{Improving DNA Antibody Conjugation} wird eine verbesserte Methode der Konjugation entwickelt. Auf Grundlage einer optimerten SPAAC-Chemie (\textit{Strain Promoted Azide-Alkyne Cycloaddition}) werden Protein und DNA-Strang verbunden und zur Aufreinigung und Qualitätskontrolle durch eine Anionenaustausch-Chromatographiesäule filtriert. Dabei werden ausschließlich Einzelkonjugate (bestehend aus einem DNA-Molekül und einem Protein-Molekül) aufgefangen. Die Effizienz der Methode wird anschließend durch PLA-Experimente bestätigt. \\ Die Cite-Seq-Methode wird in Kapitel 3.2.2 um die Möglichkeit erweitert, Interaktionen in Einzelzellen zu detektieren. Dabei wird ein Prinzip der PLA - \textit{die rolling circle amplification (RCA)} - verwendet um das Signal zu vervielfältigen. Die Studie zeigt, dass das entwickelte Prinzip funktioniert. Eine Steigerung der Effizienz gilt es jedoch mit zukünftigen Experimenten zu erreichen.
... Subcellular fractionations (performed in triplicates for each cell lines) were performed as previously described [20] with minor modifications. Chordoma cell lines (U-CH17P, U-CH17M, U-CH17S and U-CH11R) were pelleted and washed three times with PBS. ...
... Liquid chromatography was directly coupled to an Orbitrap Fusion Tribrid (Thermo Scientific) and data was acquired as previously described [24]. Raw files were searched using the MaxQuant software (version 1.5.8.3) against a Uniprot human sequence database (number of sequences 42,041) with an FDR set to 1% for positive peptide spectral matches and protein identification using a target-decoy strategy [20]. Searches were performed with maximum of two missed cleavages, oxidation of methionine residues as a variable modification, and carbamidomethylation of cysteine residues as a fixed modification. ...
Article
Full-text available
Chordomas are clinically aggressive tumors with a high rate of disease progression despite maximal therapy. Given the limited therapeutic options available, there remains an urgent need for the development of novel therapies to improve clinical outcomes. Cell surface proteins are attractive therapeutic targets yet are challenging to profile with common methods. Four chordoma cell lines were analyzed by quantitative proteomics using a differential ultracentrifugation organellar fractionation approach. A subtractive proteomics strategy was applied to select proteins that are plasma membrane enriched. Systematic data integration prioritized PLA2R1 (secretory phospholipase A2 receptor–PLA2R1) as a chordoma-enriched surface protein. The expression profile of PLA2R1 was validated across chordoma cell lines, patient surgical tissue samples, and normal tissue lysates via immunoblotting. PLA2R1 expression was further validated by immunohistochemical analysis in a richly annotated cohort of 25-patient tissues. Immunohistochemistry analysis revealed that elevated expression of PLA2R1 is correlated with poor prognosis. Using siRNA- and CRISPR/Cas9-mediated knockdown of PLA2R1, we demonstrated significant inhibition of 2D, 3D and in vivo chordoma growth. PLA2R1 depletion resulted in cell cycle defects and metabolic rewiring via the MAPK signaling pathway, suggesting that PLA2R1 plays an essential role in chordoma biology. We have characterized the proteome of four chordoma cell lines and uncovered PLA2R1 as a novel cell-surface protein required for chordoma cell survival and association with patient outcome.
... Subcellular fractionations were performed as previously described (13,14) with minor modi cations. Chordoma cell lines (U-CH17P, U-CH17M, U-CH17S and U-CH11R) were pelleted and washed three times with PBS. ...
... Liquid chromatography was directly coupled to an Orbitrap Fusion Tribrid (Thermo Scienti c) and data was acquired as previously described (15,16). Raw les were searched using the MaxQuant software (version 1.5.8.3) against a Uniprot human sequence database (number of sequences 42,041) with an FDR set to 1% for positive peptide spectral matches and protein identi cation using a target-decoy strategy (14). Searches were performed with maximum of two missed cleavages, oxidation of methionine residues as a variable modi cation, and carbamidomethylation of cysteine residues as a xed modi cation. ...
Preprint
Full-text available
Background Chordomas are rare, clinically aggressive tumors with a median survival of 6-7 years and a high rate of disease progression despite maximal surgery and radiotherapy. Given the limited options available to prevent and treat progression and relatively poor prognosis, there remains an urgent need for the development of novel therapies to improve clinical outcomes. Cell-surface proteins are attractive therapeutic targets due to their accessible subcellular localization. Methods Here, we used a proteomics approach to identify novel chordoma-specific cell-surface protein markers. Four established chordoma cell lines were analyzed by quantitative proteomics using a comprehensive differential ultracentrifugation organellar fractionation approach. A subtractive proteomics strategy was applied to select proteins that are plasma membrane enriched. Using commercially available antibodies, the expression profiles of these cell-surface proteins were validated across chordoma cell lines, patient surgical tissue samples, and normal tissue lysates via Western blotting. The candidates were further validated by immunohistochemical analysis in a 25-patient tissue cohort. Finally, the essentiality of these candidates for in vitro chordoma growth was evaluated. Results Mass spectrometry-based proteomics identified 120 high-confidence cell-surface proteins in four established chordoma cell line models. Systematic data integration prioritized two chordoma-specific cell surface proteins for further interrogation. Immunohistochemistry in a richly annotated cohort of chordoma tumor tissues revealed that PLA2R1 and SLC6A12 are broadly expressed in chordoma patient samples. Higher expression of PLA2R1 correlated with poor prognosis whereas SLC6A12 expression was significantly enriched in skull-base chordomas compared to those arising in the spine. Using a siRNA-mediated knockdown of PLA2R1, we demonstrated significant inhibition of cell growth and colony-forming ability, suggesting these proteins play an essential role in chordoma biology. Conclusion We have comprehensively elucidated the proteome of four established chordoma cell lines. Subtractive proteomics and integrative data mining revealed novel cell-surface proteins required for chordoma cell survival and associated with clinical parameters in a small chordoma tissue sample cohort.
... Such bonds facilitate interactions in cis, which lead to dimerization of the protein, and in trans, which facilitate binding between different neurons [78]. Various studies have described interactions of the extracellular part of NCAM2 with prion protein (Prp), beta-amyloid peptide, fibroblast growth factor receptor (FGFR), epidermal growth factor (EGFR), Betasite APP cleaving enzyme 1 (BACE1), Nogo and granulin (GRN) [70,[78][79][80][81][82]. Other studies have shown interactions of the cytosolic tail of NCAM2.1 with members of the 14-3-3 protein family; Proto-oncogene tyrosine-protein kinase, c-Src; microtubule associated proteins, MAPs; neurofilaments; NFs, Calcium/calmodulin-dependent protein kinase type II, CaMKII; and F-actin-capping protein, CAPZ [69,70,76,78,[81][82][83], Figure 1. ...
... Various studies have described interactions of the extracellular part of NCAM2 with prion protein (Prp), beta-amyloid peptide, fibroblast growth factor receptor (FGFR), epidermal growth factor (EGFR), Betasite APP cleaving enzyme 1 (BACE1), Nogo and granulin (GRN) [70,[78][79][80][81][82]. Other studies have shown interactions of the cytosolic tail of NCAM2.1 with members of the 14-3-3 protein family; Proto-oncogene tyrosine-protein kinase, c-Src; microtubule associated proteins, MAPs; neurofilaments; NFs, Calcium/calmodulin-dependent protein kinase type II, CaMKII; and F-actin-capping protein, CAPZ [69,70,76,78,[81][82][83], Figure 1. ...
Article
Full-text available
Although it has been over 20 years since Neural Cell Adhesion Molecule 2 (NCAM2) was identified as the second member of the NCAM family with a high expression in the nervous system, the knowledge of NCAM2 is still eclipsed by NCAM1. The first studies with NCAM2 focused on the olfactory bulb, where this protein has a key role in axonal projection and axonal/dendritic compartmentalization. In contrast to NCAM1, NCAM2’s functions and partners in the brain during development and adulthood have remained largely unknown until not long ago. Recent studies have revealed the importance of NCAM2 in nervous system development. NCAM2 governs neuronal morphogenesis and axodendritic architecture, and controls important neuron-specific processes such as neuronal differentiation, synaptogenesis and memory formation. In the adult brain, NCAM2 is highly expressed in dendritic spines, and it regulates synaptic plasticity and learning processes. NCAM2’s functions are related to its ability to adapt to the external inputs of the cell and to modify the cytoskeleton accordingly. Different studies show that NCAM2 interacts with proteins involved in cytoskeleton stability and proteins that regulate calcium influx, which could also modify the cytoskeleton. In this review, we examine the evidence that points to NCAM2 as a crucial cytoskeleton regulation protein during brain development and adulthood. This key function of NCAM2 may offer promising new therapeutic approaches for the treatment of neurodevelopmental diseases and neurodegenerative disorders.
... ( Figure 1B, bottom). As expected, high-abundance proteins were observed in a larger fraction of samples, replicating previous mass spectrometry results (Kislinger et al., 2006;Liu et al., 2004). Proteins encoded by most prostate cancer driver genes were detected in over 70% of the analyzed tumors, including MED12, FOXA1, NKX3-1, and PTEN, among others. ...
... Searches were performed with a maximum of two missed cleavages, cabamidomethylation of cysteine as fixed modification and oxidation of methionine as variable modification. False discovery rate (FDR) was set to 1% for peptide spectral matches and protein identification using a target-decoy strategy (Kislinger et al., 2006). The ProteinGroup.txt ...
Article
Graphical Abstract Highlights d A comprehensive proteomic analyses of localized prostate cancers d Integration of all levels of the central dogma (DNA / RNA / protein) d ETS fusions have divergent effects on transcriptome and proteome d Combining genomics and proteomics improves biomarker performance In Brief Sinha et al. determine the proteogenomic landscape of localized, intermediate-risk prostate cancers and show that the presence of ETS gene fusions has one of the strongest effects on the proteome. Prognostic biomarkers that integrate multi-omics significantly outperform those comprised of a single data type. SUMMARY DNA sequencing has identified recurrent mutations that drive the aggressiveness of prostate cancers. Surprisingly , the influence of genomic, epigenomic, and transcriptomic dysregulation on the tumor proteome remains poorly understood. We profiled the genomes, epigenomes, transcriptomes, and proteomes of 76 localized, intermediate-risk prostate cancers. We discovered that the genomic subtypes of prostate cancer converge on five proteomic subtypes, with distinct clinical trajectories. ETS fusions, the most common alteration in prostate tumors, affect different genes and pathways in the proteome and transcriptome. Globally, mRNA abundance changes explain only $10% of protein abundance variability. As a result, prognostic biomarkers combining genomic or epigenomic features with proteomic ones significantly outperform biomarkers comprised of a single data type.
... Numerous neuropsychiatric illnesses have been found to disrupt synaptic homeostasis (42,52) . SV2s are expressed in the mitochondrial compartment in addition to vesicles, and a damaged mitochondrial structure may have an impact on the downregulation of SV2A signaling (53,54) . It has also been suggested that levetiracetam may have additional therapeutic benefits for treating late-onset Alzheimer's disease (AD), where the SV2A protein is a specific target for the drug. ...
... Notably, swine adaptation and swine-adapted IAVs are closely related to human pandemics. All of the last five recorded influenza pandemics were caused by avianorigin, swine-origin, or reassortant IAVs (Reid et al. 2004;Kislinger et al. 2006;Bragstad et al. 2011;Long et al. 2019). Thus, it is of great importance to predict the adaptation of avian or swine IAVs to humans. ...
... Nonetheless, the proteomic data has been rarely used to model these phenotypes comparing to the usage of genomics data [58]. Though, the partial correlation between protein and mRNA abundance has already been established to be very poor, flow of information from genome to proteome in tumour remains to be unexplored in cancer biology [59,60]. It is thus very important to have high throughput methods in exploring cancer proteome in order to study the changes in signalling pathways, protein isoforms and post-translational modifications [58]. ...
Article
Full-text available
Abstract Breast Cancer has been recognized as a global health problem and a complex disease entity. When considering all cancer types, breast cancer is becoming the most frequent type of cancer affecting women of all ages and across the whole world. Era before next generation sequencing, diagnostic approaches employing genetic analysis were focused on individual factors and were unable to illustrate the comprehensive network of biological complexity encountering this aggressive disease. The introduction of next gen- eration sequencing has facilitated the acquisition of high resolution whole genome, exome and transcriptome sequencing data. This previously unattainable data enabled health professionals to gain a global view of breast cancer genomes and the full spectrum of its involvement. With the utilization of Next Generation Sequencing data from large number of breast cancer patients, it is expected that the exact signaling pathways, leading to the oncogenic transformation of breast tissue, become more understood. Next Generation Sequencing promises to revolutionize breast cancer research, therapy and diagnosis. Combining the recent technological advances and the ability to integrate data from the areas of genomics, transcriptomics and proteomics; scientists have a greater opportunity to further investigate tumor evolution, gene expression and protein involvements. In this article, we have reviewed the recent tech- nological advancements in breast cancer research and highlighted the contributions of Next Generation Sequencing technologies to such advancements. Keywords: Breast Cancer; Mass Spectrophotometer; NGS and Technology
... Examples of these are the microarray-based clinical tests, such as MammaPrint (Agendia) [31] and ColoPrint (Agendia) [32] for breast and colon cancer recurrence risk prognosis, respectively, and the rising RNA seq-based tests, which are having significant prognostic and therapeutic relevance to cancer due to their capability of high reproducibility, accuracy, and precision [30,33,34]. Measuring mRNA expression levels is cheaper but is insufficient to determine protein levels because correlations between mRNA expression and protein abundance are relatively low [35][36][37]. Correlations between mRNA and protein levels vary greatly among genes depending on regulatory processes that govern the rates of transcription, translation, posttranscriptional and posttranslational modifications, and protein/mRNA degradation. ...
Article
Full-text available
Simple Summary Around 80% of skin cancer deaths are due to melanoma. An accurate prognosis of melanoma clinical behavior from primary tumors is important for therapeutic patient management, currently based on histopathological features. The aim of our retrospective study was to investigate the clinical significance of IGF2BP3 mRNA and protein expression in melanoma progression and to evaluate which quantification method, RT-qPCR or immunohistochemistry, provides a more reliable prognostic value of IGF2BP3 expression in primary tumors. We found that IGF2BP3 mRNA expression correlated better with clinicopathologic melanoma features than the corresponding proteins and that patients with higher IGF2BP3 mRNA levels were at more risk for earlier development of metastasis, confirming its impact on melanoma survival. Our findings support the use of IGF2BP3 mRNA levels as an independent prognostic biomarker and the implementation of its RT-qPCR analysis for routine melanoma assessment, even for the earliest stages, to improve melanoma clinical outcomes and individualized treatment. Abstract Screening for prognostic biomarkers is crucial for clinical melanoma management. Insulin-like growth factor-II mRNA-binding protein 3 (IGF2BP3) has emerged as a potential melanoma diagnostic and prognostic biomarker. It is commonly tested by immunohistochemistry (IHC). Our study retrospectively examines IGF2BP3 mRNA and protein expression in primary melanomas, their correlation with clinicopathologic factors, clinical outcome, and selected miRNAs expression, and their efficiency in predicting melanoma progression and survival. RT-qPCR and IHC on IGF2BP3 expression were performed in 61 cryopreserved and 63 formalin-fixed paraffin-embedded primary melanomas, respectively, and correlated to clinicopathologic factors, distant metastasis-free survival (DMFS), and melanoma -specific survival (MSS). The correlation between RT-qPCR and IHC was significant but moderate. IGF2BP3 mRNA showed a stronger association with clinicopathologic factors (Breslow thickness, ulceration, mitosis rate, growth phase, development of metastasis, and melanoma-specific survival) than its protein counterpart. Interestingly, higher IGF2BP3 mRNA expression was detected in primary melanomas that further metastasized to distant sites and was an independent prognostic factor for the risk of unfavorable DMFS and MSS. RT-qPCR outperformed IHC in sensitivity and in predicting worse clinical outcomes. Therefore, RT-qPCR may successfully be implemented for routine IGF2BP3 assessing for the selection of melanoma patients with a higher risk of developing distant metastasis and dying of melanoma.
... Proteomics analysis have also been performed in kidney tissues [73]. These studies showed a disparate expression of proteins in the renal cortex and renal medulla and detected the thymosin β4 as a marker of sclerosis in animal models of FSGS [81][82][83]. Nevertheless, the use of proteomics in CKD has been proposed to be an alternative to kidney biopsy in some clinical contexts, that would help the nephrologist to detect kidney disease early on, select the appropriate treatment and, hopefully, to monitor the treatment effect over time [84]. ...
Article
Full-text available
Chronic kidney disease (CKD) patients are characterized by a high residual risk for cardiovascular (CV) events and CKD progression. This has prompted the implementation of new prognostic and predictive biomarkers with the aim of mitigating this risk. The ‘omics’ techniques, namely genomics, proteomics, metabolomics, and transcriptomics, are excellent candidates to provide a better understanding of pathophysiologic mechanisms of disease in CKD, to improve risk stratification of patients with respect to future cardiovascular events, and to identify CKD patients who are likely to respond to a treatment. Following such a strategy, a reliable risk of future events for a particular patient may be calculated and consequently the patient would also benefit from the best available treatment based on their risk profile. Moreover, a further step forward can be represented by the aggregation of multiple omics information by combining different techniques and/or different biological samples. This has already been shown to yield additional information by revealing with more accuracy the exact individual pathway of disease.
... A further evolution of this protocol, hyperplexed LOPIT (hyperLOPIT), used a more complex density gradient to study pluripotent E14TG2a mouse embryonic stem cells and U-2 OS human bone osteosarcoma cells, which demonstrated highest subcellular resolution than any other MS-based spatial proteomics method available to date 39,254 . HyperLOPIT has also been employed to comprehensively map the subcellular organization of S. cerevisiae, cyanobacterium (Synechocystis), and Toxoplasma gondii [255][256][257] . The method has also been coupled to FFE (see section 1.1.3.) to analyze the protein composition of Golgi sub-compartments in A. thaliana cell-suspension cultures 258 . ...
Article
Full-text available
The internal environment of cells is molecularly crowded, which requires spatial organization via subcellular compartmentalization. These compartments harbor specific conditions for molecules to perform their biological functions, such as coordination of the cell cycle, cell survival, and growth. This compartmentalization is also not static, with molecules trafficking between these subcellular neighborhoods to carry out their functions. For example, some biomolecules are multifunctional, requiring an environment with differing conditions or interacting partners, and others traffic to export such molecules. Aberrant localization of proteins or RNA species have been linked to many pathological conditions, such as neurological, cancer and pulmonary diseases. Differential expression studies in transcriptomics and proteomics are relatively common, but the majority have overlooked the importance of subcellular information. Additionally, subcellular transcriptomics and proteomics data do not always co-locate due to the biochemical processes that occur during and after translation, highlighting the complementary nature of these fields. In this review, we discuss and directly compare the current methods in spatial proteomics and transcriptomics, which include sequencing- and imaging-based strategies, to give the reader an overview of the current tools available. We also discuss current limitations of these strategies, as well as future developments in the field of spatial -omics.
... In addition to the proteins coded by 15q13.3 locus, STRING analysis identified 11 other proteins that either interact with KLF13, are coexpressed, or are mentioned together in research literature ( Figure 5A) (53,54). We also included Fbxw7, Sp1, Crebbp, and Gsk3b, which were not identified by STRING analysis, but have been shown to affect stability and expression of KLF13 (55,56). ...
Article
Full-text available
Background A number of rare copy-number variants (CNVs) have been linked to neurodevelopmental disorders. However, since CNVs encompass many genes, it is often difficult to identify the mechanisms that lead to developmental perturbations. Method We used 15q13.3 microdeletion to propose and validate a novel strategy to predict the impact of CNV genes on brain development that could further guide functional studies. We analyzed single-cell transcriptomics datasets containing cortical interneurons to identify their developmental vulnerability to 15q13.3 microdeletion, which was validated in mouse models. Results We show that Klf13, but not other 15q13.3 genes, is expressed by precursors and neuroblasts in the medial and caudal ganglionic eminences (MGE and CGE) during development, with a peak of expression at E13.5 and E18.5, respectively. In contrast, in the adult mouse brain, Klf13 expression is negligible. Using Df(h15q13.3)/+ and Klf13+/- embryos, we observed a precursor subtype-specific impairment in proliferation in the MGE and CGE at E13.5 and E17.5, respectively, corresponding to vulnerability predicted by Klf13 expression patterns. Finally, Klf13+/- mice showed a layer-specific decrease in parvalbumin and somatostatin cortical interneurons accompanied by changes in locomotor and anxiety-related behavior. Conclusions We show that the impact of 15q13.3 microdeletion on precursor proliferation is grounded in a reduction in Klf13 expression. The lack of Klf13 in Df(h15q13.3)/+ cortex might be the major reason for perturbed density of cortical interneurons. Thus, the behavioral defects seen in 15q13.3 microdeletion could stem from a developmental perturbation owing to selective vulnerability of cortical interneurons during sensitive stages of their development.
... A landmark single-organelle spatial proteomics paper described the systematic study of the human centrosome using intensity-based, label-free protein correlation profiling of consecutive density gradient fractions 30 . Several adaptations of subtractive proteomics have used SILAC ratios produced by spiking a differentially labelled internal standard into each fraction 33,34,199,200 ; this approach, with or without stable isotope labelling, has been used to characterize the proteome of peroxisomes 38,201 , nuclei 202 , autophagosomes 33,199 , lipid droplets 34 , vesicles 200 and mitochondria 35,203 . ...
Article
The eukaryotic cell is compartmentalized into subcellular niches, including membrane-bound and membrane-less organelles. Proteins localize to these niches to fulfil their function, enabling discreet biological processes to occur in synchrony. Dynamic movement of proteins between niches is essential for cellular processes such as signalling, growth, proliferation, motility and programmed cell death, and mutations causing aberrant protein localization are associated with a wide range of diseases. Determining the location of proteins in different cell states and cell types and how proteins relocalize following perturbation is important for understanding their functions, related cellular processes and pathologies associated with their mislocalization. In this Primer, we cover the major spatial proteomics methods for determining the location, distribution and abundance of proteins within subcellular structures. These technologies include fluorescent imaging, protein proximity labelling, organelle purification and cell-wide biochemical fractionation. We describe their workflows, data outputs and applications in exploring different cell biological scenarios, and discuss their main limitations. Finally, we describe emerging technologies and identify areas that require technological innovation to allow better characterization of the spatial proteome.
... Nonetheless, the proteomic data has been rarely used to model these phenotypes comparing to the usage of genomics data [58]. Though, the partial correlation between protein and mRNA abundance has already been established to be very poor, flow of information from genome to proteome in tumour remains to be unexplored in cancer biology [59,60]. It is thus very important to have high throughput methods in exploring cancer proteome in order to study the changes in signalling pathways, protein isoforms and post-translational modifications [58]. ...
... Despite tremendous amounts of genomics and transcriptomics datasets, corresponding information on cardiac proteomes and their differences across models is still scarce [28]. Previous proteomics studies of model organisms have illuminated portions of their cardiac proteomes [29,30], often with a particular focus such as cardiac development [31], disease models [32], subcellular protein expression [33,34], phosphorylation [35][36][37], protein turnover [12,38], or smaller mammals and amphibians [16]. The deepest human heart dataset obtained to date presents an impressive atlas [39], but its usefulness for quantitative comparison is limited as it was acquired from tissue collected several days postmortem [14]. ...
Article
Full-text available
Delineating human cardiac pathologies and their basic molecular mechanisms relies on research conducted in model organisms. Yet translating findings from preclinical models to humans present a significant challenge, in part due to differences in cardiac protein expression between humans and model organisms. Proteins immediately determine cellular function, yet their large-scale investigation in hearts has lagged behind those of genes and transcripts. Here, we set out to bridge this knowledge gap: By analyzing protein profiles in humans and commonly used model organisms across cardiac chambers, we determine their commonalities and regional differences. We analyzed cardiac tissue from each chamber of human, pig, horse, rat, mouse, and zebrafish in biological replicates. Using mass spectrometry–based proteomics workflows, we measured and evaluated the abundance of approximately 7,000 proteins in each species. The resulting knowledgebase of cardiac protein signatures is accessible through an online database: atlas.cardiacproteomics.com . Our combined analysis allows for quantitative evaluation of protein abundances across cardiac chambers, as well as comparisons of cardiac protein profiles across model organisms. Up to a quarter of proteins with differential abundances between atria and ventricles showed opposite chamber-specific enrichment between species; these included numerous proteins implicated in cardiac disease. The generated proteomics resource facilitates translational prospects of cardiac studies from model organisms to humans by comparisons of disease-linked protein networks across species.
... Our combined transcriptome and proteomic analyses showed that less than 4% of the total differentially regulated genes in the ID placentas were changed both at the transcriptional and translational levels. A large discrepancy between transcriptome and proteome is not uncommon in human and mouse tissues (57)(58)(59)(60)(61), and may be partly explained by posttranscriptional and post-translational regulatory mechanisms of gene expression, as well as the possibility that some proteins in placentas, especially those with endocrine and circulatory functions, may be transported from other tissues not synthesized within the placenta. As such, the transcript level of a gene does not always correlate with its protein abundance, and our results highlight the need to examine treatment effects at both transcriptional and translation levels. ...
Article
Background Maternal iron deficiency (ID) is associated with poor pregnancy and fetal outcomes. The effect is thought to be mediated by the placenta but there is no comprehensive assessment of placental responses to maternal ID. Additionally, whether the influence of maternal ID on the placenta differs by fetal sex is unknown. Objectives To identify gene and protein signatures of ID mouse placentas at mid-gestation. A secondary objective was to profile the expression of iron genes in mouse placentas across gestation. Methods We used a real-time PCR-based array to determine the mRNA expression of all known iron genes in mouse placentas at embryonic day (E) 12.5, E14.5, E16.5, and E19.5 (n = 3 placentas/time point). To determine the effect of maternal ID, we performed RNA sequencing and proteomics in male and female placentas from ID and iron-adequate mice at E12.5 (n = 8 dams/diet). Results In female placentas, 6 genes, including transferrin receptor (Tfrc) and solute carrier family 11 member 2, were significantly changed by maternal ID. An additional 154 genes were altered in male ID placentas. A proteomic analysis quantified 7662 proteins in the placenta. Proteins translated from iron-responsive element (IRE)–containing mRNA were altered in abundance; ferritin and ferroportin 1 decreased, while TFRC increased in ID placentas. Less than 4% of the significantly altered genes in ID placentas occurred both at the transcriptional and translational levels. Conclusions Our data demonstrate that the impact of maternal ID on placental gene expression in mice is limited in scope and magnitude at mid-gestation. We provide strong evidence for IRE-based transcriptional and translational coordination of iron gene expression in the mouse placenta. Finally, we discover sexually dimorphic effects of maternal ID on placental gene expression, with more genes and pathways altered in male compared with female mouse placentas.
... To infer more exact cellular concentration of proteins is imprecise because on average protein level correlates with the abundance of corresponding mRNA with a squared Pearson correlation coefficient of 0.40 [100]. Stronger positive correlation for Ca 2+ regulatory transcript abundance and protein expression, however, has been shown in a mouse microarray-proteomic study that included calsequestrin (CASQ2 r = 0.999893) and the sodium/potassium-transporting ATPase catalytic subunit (ATP1A1 r = 0.901647) [101]. Vangheluwe et al. found that SLN mRNA expression in skeletal muscle correlates positively to SLN peptide expression in mouse, rat, rabbit, and pig skeletal muscle [15]. ...
Article
Full-text available
Ca2+ regulation in equine muscle is important for horse performance, yet little is known about this species-specific regulation. We reported recently that horse encode unique gene and protein sequences for the sarcoplasmic reticulum (SR) Ca2+-transporting ATPase (SERCA) and the regulatory subunit sarcolipin (SLN). Here we quantified gene transcription and protein expression of SERCA and its inhibitory peptides in horse gluteus, as compared to commonly-studied rabbit skeletal muscle. RNA sequencing and protein immunoblotting determined that horse gluteus expresses the ATP2A1 gene (SERCA1) as the predominant SR Ca2+-ATPase isoform and the SLN gene as the most-abundant SERCA inhibitory peptide, as also found in rabbit skeletal muscle. Equine muscle expresses an insignificant level of phospholamban (PLN), another key SERCA inhibitory peptide expressed commonly in a variety of mammalian striated muscles. Surprisingly in horse, the RNA transcript ratio of SLN-to-ATP2A1 is an order of magnitude higher than in rabbit, while the corresponding protein expression ratio is an order of magnitude lower than in rabbit. Thus, SLN is not efficiently translated or maintained as a stable protein in horse muscle, suggesting a non-coding role for supra-abundant SLN mRNA. We propose that the lack of SLN and PLN inhibition of SERCA activity in equine muscle is an evolutionary adaptation that potentiates Ca2+ cycling and muscle contractility in a prey species domestically selected for speed.
... However, mammals show high correlation between RNA expression and protein quantities. 43 In addition, significant correlation between gene expression at mRNA and protein levels have been found in hibernating and active ground squirrels. 44 Restricted resources prevented us from performing protein expression studies to confirm our findings. ...
Article
Full-text available
Adipose-derived mesenchymal stem cells (ADSCs) are promising candidates for novel cell therapeutic applications. Hibernating brown bears sustain tissue integrity and function via unknown mechanisms, which might be plasma borne. We hypothesized that plasma from hibernating bears may increase the expression of favorable factors from human ADSCs. In an experimental study, ADSCs from patients with ischemic heart disease were treated with interventional media containing plasma from hibernating and active bears, respectively, and with control medium. Extracted RNA from the ADSCs was sequenced using next generation sequencing. Statistical analyses of differentially expressed genes were performed using fold change analysis, pathway analysis, and gene ontology. As a result, we found that genes associated with inflammation, such as IGF1, PGF, IL11, and TGFA, were downregulated by > 10-fold in ADSCs treated with winter plasma compared with control. Genes important for cardiovascular development, ADM, ANGPTL4, and APOL3, were upregulated in ADSCs when treated with winter plasma compared with summer plasma. ADSCs treated with bear plasma, regardless if it was from hibernating or active bears, showed downregulation of IGF1, PGF, IL11, INHBA, IER3, and HMOX1 compared with control, suggesting reduced cell growth and differentiation. This can be summarized in the conclusion that plasma from hibernating bears suppresses inflammatory genes and activates genes associated with cardiovascular development in human ADSCs. Identifying the involved regulator(s) holds therapeutic potential.
... A unique feature of proteomics technologies is the ability to detect subcellular localizations [207], protein complexes [208] and post-translational modifications [209]. These so-called proteoforms [8] must be directly detected at the protein level and cannot be simply predicted from upstream genomics/transcriptomics data. ...
Article
Full-text available
Cancer biomarkers have transformed current practices in the oncology clinic. Continued discovery and validation are crucial for improving early diagnosis, risk stratification, and monitoring patient response to treatment. Profiling of the tumour genome and transcriptome are now established tools for the discovery of novel biomarkers, but alterations in proteome expression are more likely to reflect changes in tumour pathophysiology. In the past, clinical diagnostics have strongly relied on antibody-based detection strategies, but these methods carry certain limitations. Mass spectrometry (MS) is a powerful method that enables increasingly comprehensive insights into changes of the proteome to advance personalized medicine. In this review, recent improvements in MS-based clinical proteomics are highlighted with a focus on oncology. We will provide a detailed overview of clinically relevant samples types, as well as, consideration for sample preparation methods, protein quantitation strategies, MS configurations, and data analysis pipelines currently available to researchers. Critical consideration of each step is necessary to address the pressing clinical questions that advance cancer patient diagnosis and prognosis. While the majority of studies focus on the discovery of clinically-relevant biomarkers, there is a growing demand for rigorous biomarker validation. These studies focus on high-throughput targeted MS assays and multi-centre studies with standardized protocols. Additionally, improvements in MS sensitivity are opening the door to new classes of tumour-specific proteoforms including post-translational modifications and variants originating from genomic aberrations. Overlaying proteomic data to complement genomic and transcriptomic datasets forges the growing field of proteogenomics, which shows great potential to improve our understanding of cancer biology. Overall, these advancements not only solidify MS-based clinical proteomics' integral position in cancer research, but also accelerate the shift towards becoming a regular component of routine analysis and clinical practice.
... Because data from proteomic analysis only provide isolated protein information, presumably with some false positives [22], we used a data-integrative approach to identify reliable mt proteins. We compared our proteome set with previously characterized mt proteome sets from mouse hearts [23,24] and confirmed that 98% (172 of 176) proteins matched known mt proteins (Fig. 2E, Supplementary Table S4). We investigated the physical and/or functional associations of these 172 proteins based on the notion that protein pairs that physically interact or share similar functions tend to be located in the same subcellular organelle [25]. ...
... 18 Notably, although the levels of mRNA (arising from gene expression) and protein are positively correlated, the correlation is weak, that is, it is not a perfect correlation. 19,20 Therefore, elucidation of the specifics of post-transcriptional mRNA regulation can help clarify the role of genes involved in various cellular events. ...
Article
Full-text available
Intracellular mRNA levels are not always proportional to their respective protein levels, especially in the placenta. This discrepancy may be attributed to various factors including post‐transcriptional regulation, such as mRNA methylation (N6‐methyladenosine: m⁶A). Here, we conducted a comprehensive m⁶A analysis of human placental tissue from neonates with various birth weights to clarify the involvement of m⁶A in placental biology. The augmented m⁶A levels at the 5′‐untranslated region (UTR) in mRNAs of small‐for‐date placenta samples were dominant compared to reduction of m⁶A levels, whereas a decrease in m⁶A in the vicinity of stop codons was common in heavy‐for‐date placenta samples. Notably, most of these genes showed similar expression levels between the different birth weight categories. In particular, preeclampsia placenta samples showed consistently upregulated SMPD1 protein levels and increased m⁶A at 5′‐UTR but did not show increased mRNA levels. Mutagenesis of adenosines at 5′‐UTR of SMPD1 mRNAs actually decreased protein levels in luciferase assay. Collectively, our findings suggest that m⁶A both at the 5′‐UTR and in the vicinity of stop codon in placental mRNA may play important roles in fetal growth and disease.
... Notably, swine adaptation and swine-adapted IAVs are closely related to human pandemics. All of the last five recorded influenza pandemics were caused by avian-origin, swine-origin or reassortant IAVs (Reid, Taubenberger and Fanning 2004;Kislinger et al. 2006;Bragstad et al. 2011;Long et al. 2019). Thus, it is of great importance to predict the adaptation of avian or swine IAVs to humans. ...
Article
Full-text available
Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptative IAV nucleotide composition. 217,549 IAV full-length coding sequences of the PB2 (Polymerase basic protein-2), PB1, PA (Polymerase acidic protein), HA (Hemagglutinin), NP (Nucleoprotein), NA (Neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13 and 10 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic (ROC) curve (AUC) indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic datasets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.
... The processes of SCF and PE were optimized by methods suggested in literature reports, 34,35,[58][59][60] and the isolated fractions are available for subsequent HPLC-MS analysis. ...
Article
Auranofin (AuRF) has been reported to display anticancer activity and has entered several clinical trials; however, its mechanism of action remains largely unknown. In this work, the anticancer mechanism of auranofin was investigated using a proteomics strategy entailing subcellular fractionation prior to mass spectrometric analysis. Bioinformatics analysis of the nuclear sub-proteomes revealed that tumor suppressor p14ARF is a key regulator of transcription. Through independent analysis, we validated that up-regulation of p14ARF is associated with E2F-dependent transcription and increased p53 expression. Our analyses further reveal that 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR), which is the rate-determining enzyme of the mevalonate pathway, is a novel target of auranofin with half maximal inhibitory concentration at micromolar levels. The auranofin-induced cancer cell death could be partially reverted by the addition of downstream products of the mevalonate pathway (mevalonolactone or geranyleranyl pyrophosphate (GGPP)), implying that auranofin may target the mevalonate pathway to exert its anticancer effect.
... Compartmentalization is an essential characteristic of eukaryotic cells, ensuring that cellular processes are partitioned to defined subcellular locations. High throughput microscopy 1 and biochemical fractionation coupled with mass spectrometry [2][3][4][5][6] have helped to define the proteomes of multiple organelles and macromolecular structures. However, many compartments have remained refractory to such methods, partly due to lysis and purification artefacts and poor subcompartment resolution. ...
Preprint
Compartmentalization is an essential characteristic of eukaryotic cells, ensuring that cellular processes are partitioned to defined subcellular locations. High throughput microscopy and biochemical fractionation coupled with mass spectrometry have helped to define the proteomes of multiple organelles and macromolecular structures. However, many compartments have remained refractory to such methods, partly due to lysis and purification artefacts and poor subcompartment resolution. Recently developed proximity-dependent biotinylation approaches such as BioID and APEX provide an alternative avenue for defining the composition of cellular compartments in living cells. Here we report an extensive BioID-based proximity map of a human cell, comprising 192 markers from 32 different compartments that identifies 35,902 unique high confidence proximity interactions and localizes 4,145 proteins expressed in HEK293 cells. The recall of our localization predictions is on par with or better than previous large-scale mass spectrometry and microscopy approaches, but with higher localization specificity. In addition to assigning compartment and subcompartment localization for many previously unlocalized proteins, our data contain fine- grained localization information that, for example, allowed us to identify proteins with novel roles in mitochondrial dynamics. As a community resource, we have created humancellmap.org, a website that allows exploration of our data in detail, and aids with the analysis of BioID experiments.
... In order to determine potential receptor and NE-associated protein complexes that are implicated in the recruitment of centrosomal proteins to the nucleus, we performed small interfering RNA (siRNA) screens using the myogenic precursor C2C12 cell line and automated immunofluorescence microscopy (Figure 3.2). Based on previous published mass spectrometry studies of NE fractions, we chose 238 potential nuclear transmembrane proteins or NE-associated proteins, respectively, and conceived a pool of three siRNAs for each gene (Kislinger et al., 2006;Schirmer et al., 2003;Wilkie et al., 2011). After transfection and subsequent differentiation of C2C12 myogenic precursor cells, myotubes were stained and automatically imaged. ...
Thesis
The accurate position of the nucleus during skeletal muscle formation seems to be important for muscle function, and defects have been associated with numerous muscle diseases. Nuclear positioning requires microtubules (MTs) which are reorganized from the centrosome in proliferating myoblasts to the nuclear envelope (NE) in differentiated myotubes. This dramatic MT reorganization is accompanied by a redistribution of proteins from the centrosome to the NE which thus takes over the function as a microtubule-organizing center (MTOC) during myogenic differentiation. However, the underlying mechanisms are still unknown. Here, we identified Nesprin-1 and Sun1/2, outer and inner nuclear membrane proteins, respectively, to be involved in the recruitment of MTOC function to the NE. Nesprin-1 or Sun1/2 deficient cells displayed mislocalization of centrosomal proteins to the cytoplasm and failed to regrow MTs from the NE. Moreover, the muscle-specific isoform of Nesprin-1, namely Nesprin-1alpha, was shown to be highly associated with the centrosomal proteins Akap450, Pericentrin and Pcm1 in C2C12 myotubes and to be sufficient to rescue the observed defects in Nesprin-1 depleted cells. Among the centrosomal proteins localizing at the NE during myogenic differentiation, solely Akap450 seemed to be required for MT nucleation. Akap450-Nesprin-1alpha-mediated MT nucleation from the NE was demonstrated to play an important role in nuclear positioning during myotube formation. These findings strengthen our understanding on how defects in MTOC formation at the NE can link to nuclear positioning defects in muscular dystrophies.
... 2.7 rev. 9 search engine. 2. Run the putative matches identified by SEQUEST through the STATQUEST [22] filtering algorithm to assign statistical confidence. ...
Chapter
Mitochondria (mt) are double-membraned, dynamic organelles that play an essential role in a large number of cellular processes, and impairments in mt function have emerged as a causative factor for a growing number of human disorders. Given that most biological functions are driven by physical associations between proteins, the first step towards understanding mt dysfunction is to map its protein-protein interaction (PPI) network in a comprehensive and systematic fashion. While mass-spectrometry (MS) based approaches possess the high sensitivity ideal for such an endeavor, it also requires stringent biochemical purification of bait proteins to avoid detecting spurious, non-specific PPIs. Here, we outline a tagging-based affinity purification coupled with mass spectrometry (AP-MS) workflow for discovering new mt protein associations and providing novel insights into their role in mt biology and human physiology/pathology. Because AP-MS relies on the creation of proteins fused with affinity tags, we employ a versatile-affinity (VA) tag, consisting of 3× FLAG, 6 × His, and Strep III epitopes. For efficient delivery of affinity-tagged open reading frames (ORF) into mammalian cells, the VA-tag is cloned onto a specific ORF using Gateway recombinant cloning, and the resulting expression vector is stably introduced in target cells using lentiviral transduction. In this chapter, we show a functional workflow for mapping the mt interactome that includes tagging, stable transduction, selection and expansion of mammalian cell lines, mt extraction, identification of interacting protein partners by AP-MS, and lastly, computational assessment of protein complexes/PPI networks.
... Because data from proteomic analysis only provide isolated protein information, presumably with some false positives [22], we used a data-integrative approach to identify reliable mt proteins. We compared our proteome set with previously characterized mt proteome sets from mouse hearts [23,24] and confirmed that 98% (172 of 176) proteins matched known mt proteins (Fig. 2E, Supplementary Table S4). We investigated the physical and/or functional associations of these 172 proteins based on the notion that protein pairs that physically interact or share similar functions tend to be located in the same subcellular organelle [25]. ...
Article
Tetrahydrobiopterin (BH4) shows therapeutic potential as an endogenous target in cardiovascular diseases. Although it is involved in cardiovascular metabolism and mitochondrial biology, its mechanisms of action are unclear. We investigated how BH4 regulates cardiovascular metabolism using an unbiased multiple proteomics approach with a sepiapterin reductase knock-out (Spr-/-) mouse as a model of BH4 deficiency. Spr-/- mice exhibited a shortened life span, cardiac contractile dysfunction, and morphological changes. Multiple proteomics and systems-based data-integrative analyses showed that BH4 deficiency altered cardiac mitochondrial oxidative phosphorylation. Along with decreased transcription of major mitochondrial biogenesis regulatory genes, including Ppargc1a, Ppara, Esrra, and Tfam, Spr-/- mice exhibited lower mitochondrial mass and severe oxidative phosphorylation defects. Exogenous BH4 supplementation, but not nitric oxide supplementation or inhibition, rescued these cardiac and mitochondrial defects. BH4 supplementation also recovered mRNA and protein levels of PGC1α and its target proteins involved in mitochondrial biogenesis (mtTFA and ERRα), antioxidation (Prx3 and SOD2), and fatty acid utilization (CD36 and CPTI-M) in Spr-/- hearts. These results indicate that BH4-activated transcription of PGC1α regulates cardiac energy metabolism independently of nitric oxide and suggests that BH4 has therapeutic potential for cardiovascular diseases involving mitochondrial dysfunction.
... Towards this goal, the assembly of comprehensive quantitative proteome maps has been an active area of study for different cells and tissues across several model organisms ( Fig. 8.1a). For instance, early studies profiled the protein composition of primary organ tissues using label-free spectral counts in mice [34] and zebrafish [35]. Advances in instrumentation and software have enabled intensity-based LFQ to achieve deeper proteome coverage and greater quantitative precision, as demonstrated by the initial drafts of the human proteome assembled using over 50 tissues and fluids and more than 100 cell lines [36,37]. ...
Chapter
Understanding multicellular organism development from a molecular perspective is no small feat, yet this level of comprehension affords clinician-scientists the ability to identify root causes and mechanisms of congenital diseases. Inarguably, the maturation of molecular biology tools has significantly contributed to the identification of genetic loci that underlie normal and aberrant developmental programs. In combination with cell biology approaches, these tools have begun to elucidate the spatiotemporal expression and function of developmentally-regulated proteins. The emergence of quantitative mass spectrometry (MS) for biological applications has accelerated the pace at which these proteins can be functionally characterized, driving the construction of an increasingly detailed systems biology picture of developmental processes. Here, we review the quantitative MS-based proteomic technologies that have contributed significantly to understanding the role of proteome regulation in developmental processes. We provide a brief overview of these methodologies, focusing on their ability to provide precise and accurate proteome measurements. We then highlight the use of discovery-based and targeted mass spectrometry approaches in model systems to study cellular differentiation states, tissue phenotypes, and spatiotemporal subcellular organization. We also discuss the current application and future perspectives of MS proteomics to study PTM coordination and the role of protein complexes during development.
... Moreover, Capture-HiC from GM12878 cells show the interactions of CELF4, where both the translocation breakpoints and the cluster of three obesity-associated SNPs are located within the CELF4 interaction range features in heterozygotic Celf4 +/− mice [36,37]. We cannot assess CELF4 expression in the translocation carriers, because it is exclusively expressed in the central nervous system in adults [36,38,39]. However, based on the circumstantial genetic data about the CELF4 domain and obesity, we speculate that the high BMI in the adult translocation carriers could be explained by a change of CELF4 gene regulation by a long-range positional effect. ...
Article
Family studies have established that the heritability of blood pressure is significant and genome-wide association studies (GWAS) have identified numerous susceptibility loci, including one within the non-coding part of Rho GTPase-activating protein 42 gene (ARHGAP42) on chromosome 11q22.1. Arhgap42-deficient mice have significantly elevated blood pressure, but the phenotypic effects of human variants in the coding part of the gene are unknown. In a Danish cohort of carriers with apparently balanced chromosomal rearrangements, we identified a family where a reciprocal translocation t(11;18)(q22.1;q12.2) segregated with hypertension and obesity. Clinical re-examination revealed that four carriers (age 50–77 years) have had hypertension for several years along with an increased body mass index (34–43 kg/m2). A younger carrier (age 23 years) had normal blood pressure and body mass index. Mapping of the chromosomal breakpoints with mate-pair and Sanger sequencing revealed truncation of ARHGAP42. A decreased expression level of ARHGAP42 mRNA in the blood was found in the translocation carriers relative to controls and allele-specific expression analysis showed monoallelic expression in the translocation carriers, confirming that the truncated allele of ARHGAP42 was not expressed. These findings support that haploinsufficiency of ARHGAP42 leads to an age-dependent hypertension. The other breakpoint truncated a regulatory domain of the CUGBP Elav-like family member 4 (CELF4) gene on chromosome 18q12.2 that harbours several GWAS signals for obesity. We thereby provide additional support for an obesity locus in the CELF4 domain.
... However, they do not accurately predict the expression of the protein or whether the protein is stable and functional. Several studies have shown there is only a partial concordance between transcripts and their associated proteins [11][12][13][14]. Studies performed by Ning et al. and Akbani et al. found concordance of protein to mRNA to be in the range of 0.5-0.55 ...
Article
Full-text available
Introduction: Cancer changes the proteome in complex ways that reach well beyond simple changes in protein abundance. Genomic and transcriptional variations and post translational protein modification create functional variants of a protein, known as proteoforms. Childhood cancers have fewer genomic alterations but show equally dramatic phenotypic changes as malignant cells in adults. Therefore, unraveling the complexities of the proteome is even more important in pediatric malignancies. Areas covered: In this review, the biological origins of proteoforms and technological advancements in the study of proteoforms are discussed. Particular emphasis is given to their implication in childhood malignancies and the critical role of cancer-specific proteoforms for the next generation of cancer therapies and diagnostics. Expert opinion: Recent advancements in technology have led to a better understanding of the underlying mechanisms of tumourigenesis. This has been critical for the development of more effective and less harmful treatments that are based on direct targeting of altered proteins and deregulated pathways. As proteome coverage and the ability to detect complex proteoforms increase, the most need for change is in data compilation and database availability to mediate high level data analysis and allow for better functional annotation of proteoforms.
... Variations of hyperLOPIT have recently been employed by Beltran et al. 51 , who integrated a temporal component to the workflow to analyse human lung fibroblast cytomegalovirus infection and Jadot et al. 52 , who used Nycodenz and sucrose density gradient centrifugation to determine the rat liver organelle proteome. A label-free alternative to LOPIT, Protein Correlation Profiling (PCP), has also been developed and applied to the study of the centrosome 53 and lipid droplets 54 as well as global organelle analyses 55,56 . Additionally, PCP has been used to study the proteasome complexes of P. falciparum 57 and combined with Stable Isotope Labelling with Amino acids in cell Culture (SILAC) to investigate protein-protein interactions with temporal and stoichiometric resolution 58,59 . ...
Article
Full-text available
The study of protein localisation has greatly benefited from high-throughput methods utilising cellular fractionation and proteomic profiling. Hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) is a well-established method in this area. It achieves high-resolution separation of organelles and subcellular compartments but is relatively time- and resource-intensive. As a simpler alternative, we here develop Localisation of Organelle Proteins by Isotope Tagging after Differential ultraCentrifugation (LOPIT-DC) and compare this method to the density gradient-based hyperLOPIT approach. We confirm that high-resolution maps can be obtained using differential centrifugation down to the suborganellar and protein complex level. HyperLOPIT and LOPIT-DC yield highly similar results, facilitating the identification of isoform-specific localisations and high-confidence localisation assignment for proteins in suborganellar structures, protein complexes and signalling pathways. By combining both approaches, we present a comprehensive high-resolution dataset of human protein localisations and deliver a flexible set of protocols for subcellular proteomics.
Article
Mass spectrometry-based proteomics is a sophisticated identification tool specializing in portraying protein dynamics at a molecular level. Proteomics provides biologists with a snapshot of context-dependent protein expression, isoform conformations, dynamic turnover information, and data on direct protein-protein interactions. Cardiac proteomics offers researchers and clinicians a deeper understanding of the molecular mechanisms that underscore cardiovascular disease, and is foundational to the development of future therapeutic interventions. This review encapsulates the evolution, current technologies, and future perspectives of proteomic-based mass spectrometry as it applies to the study of the heart. Key technological advancements have allowed for researchers to study proteomes at a single cell level, employ robot-assisted automation systems for enhanced sample preparation techniques, and the increase in fidelity of the mass spectrometers has allowed for the unambiguous identification of numerous dynamic post-translational modifications (PTMs). Animal models of cardiovascular disease, ranging from early animal experiments to current sophisticated models of heart failure with preserved ejection fraction (HFpEF), have provided the tools to study a challenging organ in the laboratory. Further technological development will pave way for the implementation of proteomics even closer within the clinical setting, allowing not only scientists but also patients to benefit from an understanding of protein interplay as it relates to cardiac disease physiology.
Preprint
Throughout history, Influenza A viruses (IAVs) have caused significant harm and catastrophic pandemics. The presence of host barriers results in viral host tropism, where infected hosts are subject to strict restrictions due to the hindered spread of viruses across hosts. Therefore, the identification of host tropism of IAVs, particularly in humans, is crucial to preventing the cross-host transmission of avian viruses and their outbreaks in humans. Nevertheless, efficiently and effectively identifying host tropism, especially for early host susceptibility warnings based on viral genome sequences during outbreak onset, remains challenging. To address this challenge, we propose Flu-CNN, a deep neural network model based on classical character-level convolutional networks. By analyzing the genomic segments of IAVs, Flu-CNN can accurately identify the host tropism, with a particular focus on avian influenza viruses that may infect humans. According to our experimental evaluations, Flu-CNN achieved an accuracy of 99% in identifying virus hosts via only a single genomic segment, even for subtypes with a relatively small number of viral strains such as H5N1, H7N9, and H9N2. The superiority of Flu-CNN demonstrates its effectiveness in screening for critical amino acid mutations, which is important to host adaptation, and zoonotic risk prediction of viral strains. Flu-CNN is a valuable tool for identifying evolutionary characterization, monitoring potential outbreaks, and preventing epidemical spreads of IAVs, which contribute to the effective surveillance of influenza A viruses.
Article
Fibroblast growth factors (FGFs) are key regulators of the remarkable regenerative capacity of the liver. Mice lacking FGF receptors 1 and 2 (Fgfr1 and Fgfr2) in hepatocytes are hypersensitive to cytotoxic injury during liver regeneration. Using these mice as a model for impaired liver regeneration, we identified a critical role for the ubiquitin ligase Uhrf2 in protecting hepatocytes from bile acid accumulation during liver regeneration. During regeneration after partial hepatectomy, Uhrf2 expression increased in an FGFR-dependent manner, and Uhrf2 was more abundant in the nuclei of liver cells in control mice compared with FGFR-deficient mice. Hepatocyte-specific Uhrf2 knockout or nanoparticle-mediated Uhrf2 knockdown caused extensive liver necrosis and impaired hepatocyte proliferation after partial hepatectomy, resulting in liver failure. In cultured hepatocytes, Uhrf2 interacted with several chromatin remodeling proteins and suppressed the expression of cholesterol biosynthesis genes. In vivo, the loss of Uhrf2 resulted in cholesterol and bile acid accumulation in the liver during regeneration. Treatment with a bile acid scavenger rescued the necrotic phenotype, hepatocyte proliferation, and the regenerative capacity of the liver in Uhrf2-deficient mice subjected to partial hepatectomy. Our results identify Uhrf2 as a key target of FGF signaling in hepatocytes and its essential function in liver regeneration and highlight the importance of epigenetic metabolic regulation in this process.
Article
Full-text available
The mouse is a valuable model organism for biomedical research. Here, we established a comprehensive spectral library and the DIA-based quantitative proteome maps for 41 mouse organs, including some rarely reported organs such as the cornea, retina, and nine paired organs. The mouse spectral library contained 178,304 peptides from 12,320 proteins, including 1678 proteins not reported in previous mouse spectral libraries. Our data suggested that organs from the nervous system and immune system expressed the most distinct proteome compared with other organs. We also found characteristic protein expression of immune-privileged organs, which may help understanding possible immune rejection after organ transplantation. Each tissue type expressed characteristic high-abundance proteins related to its physiological functions. We also uncovered some tissue-specific proteins which have not been reported previously. The testis expressed highest number of tissue-specific proteins. By comparison of nine paired organs including kidneys, testes, and adrenal glands, we found left organs exhibited higher levels of antioxidant enzymes. We also observed expression asymmetry for proteins related to the apoptotic process, tumor suppression, and organ functions between the left and right sides. This study provides a comprehensive spectral library and a quantitative proteome resource for mouse studies.
Article
Aim Subcellular fractionation is often used to determine the subcellular localisation of proteins, including whether a protein translocates to the nucleus in response to a given stimulus. Examining nuclear proteins in skeletal muscle is difficult because myonuclear proteins are challenging to isolate unless harsh treatments are used. This study aimed to determine the most effective method for isolating and preserving proteins in their native state in skeletal muscle. Methods We compared the ability of detergents, commercially-available kit-based and K⁺-based physiological methodologies for isolating myonuclear proteins from resting samples of human muscle by determining the presence of marker proteins for each fraction by Western blot analyses. Results We found that following the initial pelleting of nuclei, treatment with 1% Triton-X 100, 1% CHAPS or 0.5% Na-deoxycholate under various ionic conditions resulted in the nuclear proteins being either resistant to isolation or the proteins present behaving aberrantly. The nuclear proteins in brain tissue were also resistant to 1% Triton-X 100 isolation. Here, we demonstrate aberrant behaviour and erroneous localisation of proteins using the kit-based method. The aberrant behaviour was the activation of Ca²⁺-dependent protease calpain-3, and the erroneous localisation was the presence of calpain-3 and troponin I in the nuclear fraction. Conclusion Our findings indicate that it may not be possible to reliably determine the translocation of proteins between subcellular locations and the nucleus using subcellular fractionation techniques. This study highlights the importance of validating subcellular fractionation methodologies using several subcellular-specific markers and solutions that are physiologically relevant to the intracellular milieu.
Article
Successful pollination brings together the mature Brassica pollen grain and stigma papilla initiating an intricate series of molecular processes meant to eventually enable sperm cell delivery for fertilization and reproduction. At maturity, the pollen and stigma cells have acquired proteomes comprising the primary molecular effectors required upon their meeting. Knowledge of the roles and global composition of these proteomes in Brassica species is largely lacking. To address this gap, gel‐free shotgun proteomics was performed on the mature pollen and stigma of Brassica carinata, a representative of the Brassica family and its many crop species (e.g. B. napus, B. oleracea, B. rapa), which holds considerable potential as a bio‐industrial crop. 5608 and 7703 B. carinata mature pollen and stigma proteins were identified, respectively. The pollen and stigma proteomes were found to reflect not only their many common functional and developmental objectives, but also the important differences underlying their cellular specialization. Isobaric tag for relative and absolute quantification (iTRAQ) was exploited in the first analysis of a developing Brassicaceae stigma, and uncovered 251 B. carinata proteins that were differentially abundant during stigma maturation, providing insight into proteins involved in the initial phases of pollination. Corresponding pollen and stigma transcriptomes were also generated highlighting functional divergences between the proteome and transcriptome during different stages of pollen‐stigma interaction. This study illustrates the investigative potential of combining the most comprehensive Brassicaceae pollen and stigma proteomes to date with iTRAQ and transcriptome data to provide a unique global perspective of pollen and stigma development and interaction. SUPPORTING INFORMATION
Chapter
Every cell is equipped, through the differentiation process, with a given mitochondrial load and “emphasis” depending on its type and functions. Then, diverse genetic, physiological, and environmental cues determine its dynamics, maintenance, and turnover through a complex and still scarcely known signaling network coordinating the interplay between the two genomes involved: nuclear and mitochondrial. Adjusting the amount, composition, distribution, and activities of mitochondria to the energetic and other cell demands is central for cell physiology, and dysregulation of these processes may cause or contribute to a wide range of pathologies. We will outline the energy metabolic profiles of the more relevant tissues and cell types, and then we describe the main factors of mitochondrial heterogeneity among tissues (differences in mitochondrial mass, composition, and biogenesis) and discuss some of the mechanisms that underlie these differences and their consequences in physiology.
Article
Cardiovascular diseases remain the most rapidly rising contributing factor of all-cause mortality and the leading cause of inpatient hospitalization worldwide, with costs exceeding $30 billion annually in North America. Cell surface and membrane-associated proteins play an important role in cardiomyocyte biology and are involved in the pathogenesis of many human heart diseases. In cardiomyocytes, membrane proteins serve as critical signaling receptors, Ca ²⁺ cycling regulators, and electrical propagation regulators, all functioning in concert to maintain spontaneous and synchronous contractions of cardiomyocytes. Membrane proteins are excellent pharmaceutical targets due to their uniquely exposed position within the cell. Perturbations in cardiac membrane protein localization and function have been implicated in the progression and pathogenesis of many heart diseases. However, previous attempts at profiling the cardiac membrane proteome have yielded limited results due to poor technological developments for isolating hydrophobic, low-abundance membrane proteins. Comprehensive mapping and characterization of the cardiac membrane proteome thereby remains incomplete. This review will focus on recent advances in mapping the cardiac membrane proteome and the role of novel cardiac membrane proteins in the healthy and the diseased heart. Listen to this article's corresponding podcast at https://ajpheart.podbean.com/e/membrane-proteomic-profiling-of-the-heart/ .
Article
Full-text available
In the current study we examined several proteomic- and RNA-Seq-based datasets of cardiac-enriched, cell-surface and membrane-associated proteins in human fetal and mouse neonatal ventricular cardiomyocytes. By integrating available microarray and tissue expression profiles with MGI phenotypic analysis, we identified 173 membrane-associated proteins that are cardiac-enriched, conserved amongst eukaryotic species, and have not yet been linked to a ‘cardiac’ Phenotype-Ontology. To highlight the utility of this dataset, we selected several proteins to investigate more carefully, including FAM162A, MCT1, and COX20, to show cardiac enrichment, subcellular distribution and expression patterns in disease. We performed three-dimensional confocal imaging analysis to validate subcellular localization and expression in adult mouse ventricular cardiomyocytes. FAM162A, MCT1, and COX20 were expressed differentially at the transcriptomic and proteomic levels in multiple models of mouse and human heart diseases and may represent potential diagnostic and therapeutic targets for human dilated and ischemic cardiomyopathies. Altogether, we believe this comprehensive cardiomyocyte membrane proteome dataset will prove instrumental to future investigations aimed at characterizing heart disease markers and/or therapeutic targets for heart failure.
Chapter
One of the key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone, but rather interact with other biomolecular units to execute their functions, the functions of unknown proteins may be discovered through studying their associations with proteins having known functions.In this chapter, the authors discuss possible approaches to exploit protein interaction networks for automated prediction of protein functions. The major focus is on discussing the utilities and limitations of current algorithms and computational techniques for accurate computational function prediction. The chapter highlights the challenges faced in this task and explores how similarity information among different gene ontology (GO) annotation terms can be taken into account to enhance function prediction.The authors describe a new strategy that has better prediction performance than previous methods, which gives additional insights about the importance of the dependence between functional terms when inferring protein function.
Chapter
One of the most prominent properties of networks representing complex systems is modularity. Network-based module identification has captured the attention of a diverse group of scientists from various domains and a variety of methods have been developed. The ability to decompose complex biological systems into modules allows the use of modules rather than individual genes as units in biological studies. A modular view is shaping research methods in biology. Module-based approaches have found broad applications in protein complex identification, protein function prediction, protein expression prediction, as well as disease studies. Compared to single gene-level analyses, module-level analyses offer higher robustness and sensitivity. More importantly, module-level analyses can lead to a better understanding of the design and organization of complex biological systems.
Article
Eukaryotic transcription factors (TFs) coordinate different upstream signals to regulate the expression of their target genes. To unveil this regulatory network in B cell receptor signaling, we developed a computational pipeline to systematically analyze the extracellular signal‐regulated kinase (ERK)‐ and IB kinase (IKK)‐dependent transcriptome responses. We combined a bilinear regression method and kinetic modeling to identify the signal‐to‐TF and TF‐to‐gene dynamics, respectively. We input a set of time‐course experimental data for B cells and concentrated on transcriptional activators. The results show that the combination of TFs differentially controlled by ERK and IKK could contribute divergent expression dynamics in orchestrating the B cell response. Our findings provide insights into the regulatory mechanisms underlying signal‐dependent gene expression in eukaryotic cells.
Article
Full-text available
The biological significance of the DHTKD1-encoded 2-oxoadipate dehydrogenase (OADH) remains obscure due to its catalytic redundancy with the ubiquitous OGDH-encoded 2-oxoglutarate dehydrogenase (OGDH). In this work, metabolic contributions of OADH and OGDH are discriminated by exposure of cells/tissues with different DHTKD1 expression to the synthesized phosphonate analogues of homologous 2-oxodicarboxylates. The saccharopine pathway intermediates and phosphorylated sugars are abundant when cellular expressions of DHTKD1 and OGDH are comparable, while nicotinate and non-phosphorylated sugars are when DHTKD1 expression is order(s) of magnitude lower than that of OGDH. Using succinyl, glutaryl and adipoyl phosphonates on the enzyme preparations from tissues with varied DHTKD1 expression reveals the contributions of OADH and OGDH to oxidation of 2-oxoadipate and 2-oxoglutarate in vitro. In the phosphonates-treated cells with the high and low DHTKD1 expression, adipate or glutarate, correspondingly, are the most affected metabolites. The marker of fatty acid β-oxidation, adipate, is mostly decreased by the shorter, OGDH-preferring, phosphonate, in agreement with the known OGDH dependence of β-oxidation. The longest, OADH-preferring, phosphonate mostly affects the glutarate level. Coupled decreases in sugars and nicotinate upon the OADH inhibition link the perturbation in glucose homeostasis, known in OADH mutants, to the nicotinate-dependent NAD metabolism.
Article
The synaptic vesicle glycoprotein 2 (SV2) family is comprised of three paralogs: SV2A, SV2B, and SV2C. In vertebrates, SV2s are 12-transmembrane proteins present on every secretory vesicle, including synaptic vesicles, and are critical to neurotransmission. Structural and functional studies suggest that SV2 proteins may play several roles to promote proper vesicular function. Among these roles are their potential to stabilize the transmitter content of vesicles, to maintain and orient the releasable pool of vesicles, and to regulate vesicular calcium sensitivity to ensure efficient, coordinated release of transmitter. The SV2 family is highly relevant to human health in a number of ways. First, SV2A plays a role in neuronal excitability, and as such is the specific target for the antiepileptic drug levetiracetam. SV2 proteins also act as the target by which potent neurotoxins, particularly botulinum, gain access to neurons and exert their toxicity. Both SV2B and SV2C are increasingly implicated in diseases such as Alzheimer’s disease and Parkinson’s disease. Interestingly, despite decades of intensive research, their exact function remains elusive. Thus, SV2 proteins are intriguing in their potentially diverse roles within the presynaptic terminal, and several recent developments have enhanced our understanding and appreciation of the protein family. Here we review the structure and function of SV2 proteins as well as their relevance to disease and therapeutic development.
Preprint
We have analyzed gene transcription, protein expression, and enzymatic activity of the Ca2+-transporting ATPase (SERCA) in horse gluteal muscle. Horses are bred for peak athletic performance but exhibit a high incidence of exertional rhabdomyolysis, suggesting Ca2+ as a correlative linkage. To assess Ca2+ regulation in horse gluteus, we developed an improved protocol for isolating horse sarcoplasmic reticulum (SR) vesicles. RNA-seq and immunoblotting determined that the ATP2A1 gene (protein product SERCA1) is the predominant Ca2+ ATPase expressed in horse gluteus, as in rabbit muscle. Gene expression was assessed for four regulatory peptides of SERCA, finding that sarcolipin (SLN) is the predominant regulatory peptide transcript expressed in horse gluteus, as in rabbit muscle. Surprisingly, the RNA transcription ratio of SLN-to-ATP2A1 in horse gluteus is an order of magnitude higher than in rabbit muscle, but conversely, the protein expression ratio of SLN-to-SERCA1 in horse gluteus is an order of magnitude lower than in rabbit. Thus, the SLN gene is not translated to a stable protein in horse gluteus, yet the supra-high level of SLN RNA suggests a non-coding role. Gel-stain analysis revealed that horse SR expresses calsequestrin (CASQ) protein abundantly, with a CASQ-to-SERCA protein ratio ~3-fold greater than rabbit SR. The Ca2+ transport rate of horse SR vesicles is ~2-fold greater than rabbit SR, suggesting that horse myocytes have enhanced luminal Ca2+ stores that increase intracellular Ca2+ release and muscular contractility. We hypothesize that the absence of SLN inhibition of SERCA and the abundant expression of CASQ may potentiate horse susceptibility to exertional rhabdomyolysis.
Article
Background Recent advances in mass spectrometric instrumentation and bioinformatics have critically contributed to the field of proteogenomics. Nonetheless, whether that integrative approach has reached the point of maturity to effectively reveal the flow of genetic variants from DNA to proteins still remains elusive. The objective of this study was to detect somatically acquired protein variants in breast cancer specimens for which full genome and transcriptome data was already available (BASIS cohort). Methods LC-MS/MS shotgun proteomic results of 21 breast cancer tissues were coupled to DNA sequencing data to identify variants at the protein level and finally were used to associate protein expression with gene expression levels. Results Here we report the observation of three sequencing-predicted single amino acid somatic variants. The sensitivity of single amino acid variant (SAAV) detection based on DNA sequencing-predicted single nucleotide variants was 0.4%. This sensitivity was increased to 0.6% when all the predicted variants were filtered for MS “compatibility” and was further increased to 2.9% when only proteins with at least one wild type peptide detected were taken into account. A correlation of mRNA abundance and variant peptide detection revealed that transcripts for which variant proteins were detected ranked among the top 6.3% most abundant transcripts. The variants were detected in highly abundant proteins as well, thus establishing transcript and protein abundance and MS “compatibility” as the main factors affecting variant onco-proteogenomic identification. Conclusions While proteomics fails to identify the vast majority of exome DNA variants in the resulting proteome, its ability to detect a small subset of SAAVs could prove valuable for precision medicine applications.
Article
Full-text available
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. The rediscovery of Mendel's laws of heredity in the opening weeks of the 20th century 1±3 sparked a scienti®c quest to understand the nature and content of genetic information that has propelled biology for the last hundred years. The scienti®c progress made falls naturally into four main phases, corresponding roughly to the four quarters of the century. The ®rst established the cellular basis of heredity: the chromosomes. The second de®ned the molecular basis of heredity: the DNA double helix. The third unlocked the informa-tional basis of heredity, with the discovery of the biological mechan-ism by which cells read the information contained in genes and with the invention of the recombinant DNA technologies of cloning and sequencing by which scientists can do the same. The last quarter of a century has been marked by a relentless drive to decipher ®rst genes and then entire genomes, spawning the ®eld of genomics. The fruits of this work already include the genome sequences of 599 viruses and viroids, 205 naturally occurring plasmids, 185 organelles, 31 eubacteria, seven archaea, one fungus, two animals and one plant. Here we report the results of a collaboration involving 20 groups from the United States, the United Kingdom, Japan, France, Germany and China to produce a draft sequence of the human genome. The draft genome sequence was generated from a physical map covering more than 96% of the euchromatic part of the human genome and, together with additional sequence in public databases, it covers about 94% of the human genome. The sequence was produced over a relatively short period, with coverage rising from about 10% to more than 90% over roughly ®fteen months. The sequence data have been made available without restriction and updated daily throughout the project. The task ahead is to produce a ®nished sequence, by closing all gaps and resolving all ambiguities. Already about one billion bases are in ®nal form and the task of bringing the vast majority of the sequence to this standard is now straightforward and should proceed rapidly. The sequence of the human genome is of interest in several respects. It is the largest genome to be extensively sequenced so far, being 25 times as large as any previously sequenced genome and eight times as large as the sum of all such genomes. It is the ®rst vertebrate genome to be extensively sequenced. And, uniquely, it is the genome of our own species. Much work remains to be done to produce a complete ®nished sequence, but the vast trove of information that has become available through this collaborative effort allows a global perspective on the human genome. Although the details will change as the sequence is ®nished, many points are already clear. X The genomic landscape shows marked variation in the distribu-tion of a number of features, including genes, transposable elements, GC content, CpG islands and recombination rate. This gives us important clues about function. For example, the devel-opmentally important HOX gene clusters are the most repeat-poor regions of the human genome, probably re¯ecting the very complex
Article
Full-text available
The long-term challenge of proteomics is enormous: to define the identities, quantities, structures and functions of complete complements of proteins, and to characterize how these properties vary in different cellular contexts. One critical step in tackling this goal is the generation of sets of clones that express a representative of each protein of a proteome in a useful format, followed by the analysis of these sets on a genome-wide basis. Such studies enable genetic, biochemical and cell biological technologies to be applied on a systematic level, leading to the assignment of biochemical activities, the construction of protein arrays, the identification of interactions, and the localization of proteins within cellular compartments.
Article
Full-text available
Motivation: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particularly breadth of coverage. Rather than using sequence information alone, we have explored the use of database text annotations from homologs and machine learning to substantially improve the prediction of subcellular location. Results: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria, which are 81% accurate for fungi and 92-94% accurate for the other four categories. These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service.
Article
Full-text available
Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.
Article
Full-text available
Motivation: Most of the existing methods in predicting protein subcellular location were used to deal with the cases limited within the scope from two to five localizations, and only a few of them can be effectively extended to cover the cases of 12-14 localizations. This is because the more the locations involved are, the poorer the success rate would be. Besides, some proteins may occur in several different subcellular locations, i.e. bear the feature of 'multiplex locations'. So far there is no method that can be used to effectively treat the difficult multiplex location problem. The present study was initiated in an attempt to address (1) how to efficiently identify the localization of a query protein among many possible subcellular locations, and (2) how to deal with the case of multiplex locations. Results: By hybridizing gene ontology, functional domain and pseudo amino acid composition approaches, a new method has been developed that can be used to predict subcellular localization of proteins with multiplex location feature. A global analysis of the proteins in budding yeast classified into 22 locations was performed by jack-knife cross-validation with the new method. The overall success identification rate thus obtained is 70%. In contrast to this, the corresponding rates obtained by some other existing methods were only 13-14%, indicating that the new method is very powerful and promising. Furthermore, predictions were made for the four proteins whose localizations could not be determined by experiments, as well as for the 236 proteins whose localizations in budding yeast were ambiguous according to experimental observations. However, according to our predicted results, many of these 'ambiguous proteins' were found to have the same score and ranking for several different subcellular locations, implying that they may simultaneously exist, or move around, in these locations. This finding is intriguing because it reflects the dynamic feature of these proteins in a cell that may be associated with some special biological functions.
Article
Full-text available
In the field of statistical discrimination k-nearest neighbor classification is a well-known, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection to LOESS, a local regression technique. In addition we show possibilities to use nearest neighbor for classification in the case of an ordinal class structure. Empirical studies show the advantages of the new techniques.
Article
Full-text available
We have determined the relationship between mRNA and protein expression levels for selected genes expressed in the yeast Saccharomyces cerevisiae growing at mid-log phase. The proteins contained in total yeast cell lysate were separated by high-resolution two-dimensional (2D) gel electrophoresis. Over 150 protein spots were excised and identified by capillary liquid chromatography-tandem mass spectrometry (LC-MS/MS). Protein spots were quantified by metabolic labeling and scintillation counting. Corresponding mRNA levels were calculated from serial analysis of gene expression (SAGE) frequency tables (V. E. Velculescu, L. Zhang, W. Zhou, J. Vogelstein, M. A. Basrai, D. E. Bassett, Jr., P. Hieter, B. Vogelstein, and K. W. Kinzler, Cell 88:243–251, 1997). We found that the correlation between mRNA and protein levels was insufficient to predict protein expression levels from quantitative mRNA data. Indeed, for some genes, while the mRNA levels were of the same value the protein levels varied by more than 20-fold. Conversely, invariant steady-state levels of certain proteins were observed with respective mRNA transcript levels that varied by as much as 30-fold. Another interesting observation is that codon bias is not a predictor of either protein or mRNA levels. Our results clearly delineate the technical boundaries of current approaches for quantitative analysis of protein expression and reveal that simple deduction from mRNA transcript analysis is insufficient.
Article
Full-text available
The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.
Article
Full-text available
We describe a largely unbiased method for rapid and large-scale proteome analysis by multidimensional liquid chromatography, tandem mass spectrometry, and database searching by the SEQUEST algorithm, named multidimensional protein identification technology (MudPIT). MudPIT was applied to the proteome of the Saccharomyces cerevisiae strain BJ5460 grown to mid-log phase and yielded the largest proteome analysis to date. A total of 1,484 proteins were detected and identified. Categorization of these hits demonstrated the ability of this technology to detect and identify proteins rarely seen in proteome analysis, including low-abundance proteins like transcription factors and protein kinases. Furthermore, we identified 131 proteins with three or more predicted transmembrane domains, which allowed us to map the soluble domains of many of the integral membrane proteins. MudPIT is useful for proteome analysis and may be specifically applied to integral membrane proteins to obtain detailed biochemical information on this unwieldy class of proteins.
Article
Full-text available
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Article
Full-text available
The placenta is the first organ to form during mammalian embryogenesis. Problems in its formation and function underlie many aspects of early pregnancy loss and pregnancy complications in humans. Because the placenta is critical for survival, it is very sensitive to genetic disruption, as reflected by the ever-increasing list of targeted mouse mutations that cause placental defects. Recent studies of mouse mutants with disrupted placental development indicate that signalling interactions between the placental trophoblast and embryonic cells have a key role in placental morphogenesis. Furthering our understanding of mouse trophoblast development should provide novel insights into human placental function.
Article
Full-text available
Although the mature neutrophil is one of the better characterized mammalian cell types, the mechanisms of myeloid differentiation are incompletely understood at the molecular level. A mouse promyelocytic cell line (MPRO), derived from murine bone marrow cells and arrested developmentally by a dominant-negative retinoic acid receptor, morphologically differentiates to mature neutrophils in the presence of 10 microM retinoic acid. An extensive catalog was prepared of the gene expression changes that occur during morphologic maturation. To do this, 3'-end differential display, oligonucleotide chip array hybridization, and 2-dimensional protein electrophoresis were used. A large number of genes whose mRNA levels are modulated during differentiation of MPRO cells were identified. The results suggest the involvement of several transcription regulatory factors not previously implicated in this process, but they also emphasize the importance of events other than the production of new transcription factors. Furthermore, gene expression patterns were compared at the level of mRNA and protein, and the correlation between 2 parameters was studied. (Blood. 2001;98:513-524)
Article
Full-text available
Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any eukaryote. Using directed topoisomerase I-mediated cloning strategies and genome-wide transposon mutagenesis, we have epitope-tagged 60% of the Saccharomyces cerevisiae proteome. By high-throughput immunolocalization of tagged gene products, we have determined the subcellular localization of 2744 yeast proteins. Extrapolating these data through a computational algorithm employing Bayesian formalism, we define the yeast localizome (the subcellular distribution of all 6100 yeast proteins). We estimate the yeast proteome to encompass approximately 5100 soluble proteins and >1000 transmembrane proteins. Our results indicate that 47% of yeast proteins are cytoplasmic, 13% mitochondrial, 13% exocytic (including proteins of the endoplasmic reticulum and secretory vesicles), and 27% nuclear/nucleolar. A subset of nuclear proteins was further analyzed by immunolocalization using surface-spread preparations of meiotic chromosomes. Of these proteins, 38% were found associated with chromosomal DNA. As determined from phenotypic analyses of nuclear proteins, 34% are essential for spore viability--a percentage nearly twice as great as that observed for the proteome as a whole. In total, this study presents experimentally derived localization data for 955 proteins of previously unknown function: nearly half of all functionally uncharacterized proteins in yeast. To facilitate access to these data, we provide a searchable database featuring 2900 fluorescent micrographs at http://ygac.med.yale.edu.
Article
Full-text available
Using an integrated genomic and proteomic approach, we have investigated the effects of carbon source perturbation on steady-state gene expression in the yeast Saccharomyces cerevisiae growing on either galactose or ethanol. For many genes, significant differences between the abundance ratio of the messenger RNA transcript and the corresponding protein product were observed. Insights into the perturbative effects on genes involved in respiration, energy generation, and protein synthesis were obtained that would not have been apparent from measurements made at either the messenger RNA or protein level alone, illustrating the power of integrating different types of data obtained from the same sample for the comprehensive characterization of biological systems and processes.
Article
Full-text available
We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a "small-world network", linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of "bridging" domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment ("locale"), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.
Article
Full-text available
We have developed a systematic analytical approach, termed PRISM (Proteomic Investigation Strategy for Mammals), that permits routine, large scale protein expression profiling of mammalian cells and tissues. PRISM combines subcellular fractionation, multidimensional liquid chromatography-tandem mass spectrometry-based protein shotgun sequencing, and two newly developed computer algorithms, STATQUEST and GOClust, as a means to rapidly identify, annotate, and categorize thousands of expressed mammalian proteins. The application of PRISM to adult mouse lung and liver resulted in the high confidence identification of over 2,100 unique proteins including more than 100 integral membrane proteins, 400 nuclear proteins, and 500 uncharacterized proteins, the largest proteome study carried out to date on this important model organism. Automated clustering of the identified proteins into Gene Ontology annotation groups allowed for streamlined analysis of the large data set, revealing interesting and physiologically relevant patterns of tissue and organelle specificity. PRISM therefore offers an effective platform for in-depth investigation of complex mammalian proteomes.
Article
Full-text available
The Golgi complex functions to posttranslationally modify newly synthesized proteins and lipids and to sort them to their sites of function. In this study, a stacked Golgi fraction was isolated by classical cell fractionation, and the protein complement (the Golgi proteome) was characterized using multidimensional protein identification technology. Many of the proteins identified are known residents of the Golgi, and 64% of these are predicted transmembrane proteins. Proteins localized to other organelles also were identified, strengthening reports of functional interfacing between the Golgi and the endoplasmic reticulum and cytoskeleton. Importantly, 41 proteins of unknown function were identified. Two were selected for further analysis, and Golgi localization was confirmed. One of these, a putative methyltransferase, was shown to be arginine dimethylated, and upon further proteomic analysis, arginine dimethylation was identified on 18 total proteins in the Golgi proteome. This survey illustrates the utility of proteomics in the discovery of novel organellar functions and resulted in 1) a protein profile of an enriched Golgi fraction; 2) identification of 41 previously uncharacterized proteins, two with confirmed Golgi localization; 3) the identification of arginine dimethylated residues in Golgi proteins; and 4) a confirmation of methyltransferase activity within the Golgi fraction.
Article
Full-text available
The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale. Toward this end, we have designed custom arrays that interrogate the expression of the vast majority of protein-encoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues. The resulting data set provides the expression patterns for thousands of predicted genes, as well as known and poorly characterized genes, from mice and humans. We have explored this data set for global trends in gene expression, evaluated commonly used lines of evidence in gene prediction methodologies, and investigated patterns indicative of chromosomal organization of transcription. We describe hundreds of regions of correlated transcription and show that some are subject to both tissue and parental allele-specific expression, suggesting a link between spatial expression and imprinting.
Article
Full-text available
Determining the site of a regulatory phosphorylation event is often essential for elucidating specific kinase–substrate relationships, providing a handle for understanding essential signaling pathways and ultimately allowing insights into numerous disease pathologies. Despite intense research efforts to elucidate mechanisms of protein phosphorylation regulation, efficient, large-scale identification and characterization of phosphorylation sites remains an unsolved problem. In this report we describe an application of existing technology for the isolation and identification of phosphorylation sites. By using a strategy based on strong cation exchange chromatography, phosphopeptides were enriched from the nuclear fraction of HeLa cell lysate. From 967 proteins, 2,002 phosphorylation sites were determined by tandem MS. This unprecedented large collection of sites permitted a detailed accounting of known and unknown kinase motifs and substrates. • phosphorylation • mass spectrometry • strong cation exchange chromatography
Article
Full-text available
The prediction of subcellular localization of proteins from their primary sequence is a challenging problem in bioinformatics. We have created a Bayesian network localization predictor called PSLT that is based on the combinatorial presence of InterPro motifs and specific membrane domains in human proteins. This probabilistic framework generates a likelihood of localization to all organelles and allows to predict multicompartmental proteins. When used to predict on nine compartments, PSLT achieves an accuracy of 78% as estimated by using a 10-fold cross-validation test and a coverage of 74%. When used to predict the localization of proteins from other closely related species, it achieves a prediction accuracy and a coverage >80%. We compared the localization predictions of PSLT to those determined through GFP-tagging and microscopy for a group of human proteins. We found two general classes of proteins that are mislocalized by the GFP-tagging strategy but are correctly localized by PSLT. This suggests that PSLT can be used in combination with experimental approaches for localization to identify proteins for which additional experimental validation is required. We used our predictor to annotate all 9793 human proteins from SWISS-PROT release 41.25, 16% of which are predicted by PSLT to be present in more than one compartment.
Article
Full-text available
To investigate the role of post-transcriptional controls in the regulation of protein expression for the malaria parasite, Plasmodium falciparum, we have compared mRNA transcript and protein abundance levels for seven different stages of the parasite life cycle. A moderately high positive relationship between mRNA and protein abundance was observed for these stages; the most common discrepancy was a delay between mRNA and protein accumulation. Potentially post-transcriptionally regulated genes are identified, and families of functionally related genes were observed to share similar patterns of mRNA and protein accumulation.
Article
Full-text available
Large-scale quantitative analysis of transcriptional co-expression has been used to dissect regulatory networks and to predict the functions of new genes discovered by genome sequencing in model organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function in mammals is widely accepted, it has not been objectively tested nor compared with the related but distinct strategy of correlating gene co-expression as a means to predict gene function. We generated microarray expression data for nearly 40,000 known and predicted mRNAs in 55 mouse tissues, using custom-built oligonucleotide arrays. We show that quantitative transcriptional co-expression is a powerful predictor of gene function. Hundreds of functional categories, as defined by Gene Ontology 'Biological Processes', are associated with characteristic expression patterns across all tissues, including categories that bear no overt relationship to the tissue of origin. In contrast, simple tissue-specific restriction of expression is a poor predictor of which genes are in which functional categories. As an example, the highly conserved mouse gene PWP1 is widely expressed across different tissues but is co-expressed with many RNA-processing genes; we show that the uncharacterized yeast homolog of PWP1 is required for rRNA biogenesis. We conclude that 'functional genomics' strategies based on quantitative transcriptional co-expression will be as fruitful in mammals as they have been in simpler organisms, and that transcriptional control of mammalian physiology is more modular than is generally appreciated. Our data and analyses provide a public resource for mammalian functional genomics.
Article
Full-text available
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Article
Full-text available
Proteomics is potentially a powerful technology for elucidating brain function and neurodegenerative diseases. So far, the brain proteome has generally been analyzed by two-dimensional gel electrophoresis, which usually leads to the complete absence of membrane proteins. We describe a proteomic approach for profiling of plasma membrane proteins from mouse brain. The procedure consists of a novel method for extraction and fractionation of membranes, on-membrane digestion, diagonal separation of peptides, and high-sensitivity analysis by advanced MS. Breaking with the classical plasma membrane fractionation approach, membranes are isolated without cell compartment isolation, by stepwise depletion of nonmembrane molecules from entire tissue homogenate by high-salt, carbonate, and urea washes followed by treatment of the membranes with sublytic concentrations of digitonin. Plasma membrane is further enriched by of density gradient fractionation and protein digested on-membrane by endoproteinase Lys-C. Released peptides are separated, fractions digested by trypsin, and analyzed by LC-MS/MS. In single experiments, the developed technology enabled identification of 862 proteins from 150 mg of mouse brain cortex. Further development and miniaturization allowed analysis of 15 mg of hippocampus, revealing 1,685 proteins. More that 60% of the identified proteins are membrane proteins, including several classes of ion channels and neurotransmitter receptors. Our work now allows in-depth study of brain membrane proteomes, such as of mouse models of neurological disease.
Article
Full-text available
Motivation: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particularly breadth of coverage. Rather than using sequence information alone, we have explored the use of database text annotations from homologs and machine learning to substantially improve the prediction of subcellular location. Results: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria, which are 81% accurate for fungi and 92–94% accurate for the other four categories. These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service. Availability:http://www.cs.ualberta.ca/~bioinfo/PA/Sub, http://www.cs.ualberta.ca/~bioinfo/PA Supplementary information:http://www.cs.ualberta.ca/~bioinfo/PA/Subcellular
Article
Using an integrated genomic and proteomic approach, we have investigated the effects of carbon source perturbation on steady-state gene expression in the yeast Saccharomyces cerevisiae growing on either galactose or ethanol. For many genes, significant differences between the abundance ratio of the messenger RNA transcript and the corresponding protein product were observed. Insights into the perturbative effects on genes involved in respiration, energy generation, and protein synthesis were obtained that would not have been apparent from measurements made at either the messenger RNA or protein level alone, illustrating the power of integrating different types of data obtained from the same sample for the comprehensive characterization of biological systems and processes.
Article
To automate examination of massive amounts of sequence data for biological function, it is important to computerize interpretation based on empirical knowledge of sequence-function relationships. For this purpose, we have been constructing a knowledge base by organizing various experimental and computational observations as a collection of if-then rules. Here we report an expert system, which utilizes this knowledge base, for predicting localization sites of proteins only from the information on the amino acid sequence and the source origin. We collected data for 401 eukaryotic proteins with known localization sites (subcellular and extracellular) and divided them into training data and testing data. Fourteen localization sites were distinguished for animal cells and 17 for plant cells. When sorting signals were not well characterized experimentally, various sequence features were computationally derived from the training data. It was found that 66% of the training data and 59% of the testing data were correctly predicted by our expert system. This artificial intelligence approach is powerful and flexible enough to be used in genome analyses.
Article
A new P450 responsible for mutagenic activation of 3-methoxy-4-aminoazobenzene (3-MeO-AAB) which is a potent procarcinogen was purified from renal microsomes of male mice using an index of umu gene expression. The purified P450 had high bioactivation toward 3-MeO-AAB and also 2-aminofluorene and 2-aminoanthracene. The antibody against this P450 completely inhibited mutagenic activation of 3-MeO-AAB of mouse renal microsomes. With immunoblotting, this form was present abundantly in renal microsomes of male mice but not in those of female mice. This P450 was also present in pulmonary microsomes of male and female mice but not in hepatic microsomes. The NH2-terminal amino acid sequence analysis indicated that this form belonged to the CYP4B subfamily. Thus, mouse kidney cDNA library was screened with rat CYP4B1 probe. The cDNA-deduced amino acid sequence of isolated cDNA consisted of 511 amino acids and bore 90, 86, and 84% similarities to rat, rabbit, and human CYP4B1, respectively. The NH2-terminal amino acid sequence of the purified renal P450 and amino acid sequence of BrCN-digested peptides from the purified P450 agreed with the cDNA-deduced amino acid sequence. These results suggest that CYP4B1 is a major form in renal microsomes of male mice and plays a major role in mutagenic activation of 3-MeO-AAB. In extrahepatic tissue, CYP4B1 may contribute to chemical carcinogenesis.
Article
Sequences of intracellular and extracellular soluble proteins were analyzed statistically in terms of amino acid composition and residue-pair frequencies. Residue-pair frequencies were calculated for sequential separations from (n, n + 1) to (n, n + 5), and converted into scoring parameters. Then, for each test protein, the single-residue and residue-pair parameters were applied to calculate a total score. According to our definition, a protein which yields a positive score is indicative of an intracellular protein, whereas a negative score implies an extracellular one. The parameter set was derived from 894 sequences constituting different protein families in the PIR database, and assessed by application to a test of 379 proteins. The results showed that 88% of intracellular and 84% of extracellular proteins were correctly assigned. The discrimination power was improved by about 8% in comparison with the previous study, which used composition data alone. Segregation of intra/extracellular proteins is also observed by other criteria, such as structural class (intracellular proteins prefer alpha and alpha/beta types and extracellular proteins prefer beta and alpha + beta types). Segregation by sequence was found to be a more reliable procedure for distinguishing intra/extracellular proteins than methods using structural class. Possible causes for this segregation by sequence are discussed.
Article
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Article
We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.
Article
At certain junctures in development, gene transcription is coupled to the completion of landmark morphological events. We refer to this dependence on morphogenesis for gene expression as "morphological coupling." Three examples of morphological coupling in prokaryotes are reviewed in which the activation of a transcription factor is tied to the assembly of a critically important structure in development.
Article
The nucleolus is a subnuclear organelle containing the ribosomal RNA gene clusters and ribosome biogenesis factors. Recent studies suggest it may also have roles in RNA transport, RNA modification, and cell cycle regulation. Despite over 150 years of research into nucleoli, many aspects of their structure and function remain uncharacterized. We report a proteomic analysis of human nucleoli. Using a combination of mass spectrometry (MS) and sequence database searches, including online analysis of the draft human genome sequence, 271 proteins were identified. Over 30% of the nucleolar proteins were encoded by novel or uncharacterized genes, while the known proteins included several unexpected factors with no previously known nucleolar functions. MS analysis of nucleoli isolated from HeLa cells in which transcription had been inhibited showed that a subset of proteins was enriched. These data highlight the dynamic nature of the nucleolar proteome and show that proteins can either associate with nucleoli transiently or accumulate only under specific metabolic conditions. This extensive proteomic analysis shows that nucleoli have a surprisingly large protein complexity. The many novel factors and separate classes of proteins identified support the view that the nucleolus may perform additional functions beyond its known role in ribosome subunit biogenesis. The data also show that the protein composition of nucleoli is not static and can alter significantly in response to the metabolic state of the cell.
Article
[corrected] The existence of several technologies for measuring gene expression makes the question of cross-technology agreement of measurements an important issue. Cross-platform utilization of data from different technologies has the potential to reduce the need to duplicate experiments but requires corresponding measurements to be comparable. A comparison of mRNA measurements of 2895 sequence-matched genes in 56 cell lines from the standard panel of 60 cancer cell lines from the National Cancer Institute (NCI 60) was carried out by calculating correlation between matched measurements and calculating concordance between cluster from two high-throughput DNA microarray technologies, Stanford type cDNA microarrays and Affymetrix oligonucleotide microarrays. In general, corresponding measurements from the two platforms showed poor correlation. Clusters of genes and cell lines were discordant between the two technologies, suggesting that relative intra-technology relationships were not preserved. GC-content, sequence length, average signal intensity, and an estimator of cross-hybridization were found to be associated with the degree of correlation. This suggests gene-specific, or more correctly probe-specific, factors influencing measurements differently in the two platforms, implying a poor prognosis for a broad utilization of gene expression measurements across platforms.
Article
Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein-protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.
Article
Highly complex protein mixtures can be directly analyzed after proteolysis by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS). In this paper, we have utilized the combination of strong cation exchange (SCX) and reversed-phase (RP) chromatography to achieve two-dimensional separation prior to MS/MS. One milligram of whole yeast protein was proteolyzed and separated by SCX chromatography (2.1 mm i.d.) with fraction collection every minute during an 80-min elution. Eighty fractions were reduced in volume and then re-injected via an autosampler in an automated fashion using a vented-column (100 microm i.d.) approach for RP-LC-MS/MS analysis. More than 162,000 MS/MS spectra were collected with 26,815 matched to yeast peptides (7,537 unique peptides). A total of 1,504 yeast proteins were unambiguously identified in this single analysis. We present a comparison of this experiment with a previously published yeast proteome analysis by Yates and colleagues (Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-7). In addition, we report an in-depth analysis of the false-positive rates associated with peptide identification using the Sequest algorithm and a reversed yeast protein database. New criteria are proposed to decrease false-positives to less than 1% and to greatly reduce the need for manual interpretation while permitting more proteins to be identified.
Article
Proteomics is a powerful tool to screen brain protein expression but the methodology is hampered by low abundance of proteins or compartmentalization or overload of high-abundance proteins. It was therefore the aim of the study to determine the expression of brain proteins by using enriched cellular subfractions and pre-electrophoretic chromatographical separation of brain homogenates. We used two-dimensional electrophoresis with subsequent matrix-assisted laser desorption/ionization (MALDI) detection and characterization of brain proteins. Subfractionation into cytosolic, mitochondrial and microsomal compartments was performed by ultracentrifugation. Pre-electrophoretic fractionation of the cytosolic fractions was carried out by ion exchange column chromatography. We detected and identified a large series of 437 proteins in rat brain and have shown proteins specific for the individual subcellular compartments. These proteins included housekeeping, signaling, cytoskeletal, intermediary metabolism, antioxidant proteins on the one and neuron and synaptosomal specific proteins on the other hand. Using fractionations of brain homogenates we were able to improve the power of the method on forming the basis for brain protein expressional studies and providing a reference map as a powerful tool for the neuroscientist.
Article
To comprehensively identify integral membrane proteins of the nuclear envelope (NE), we prepared separately NEs and organelles known to cofractionate with them from liver. Proteins detected by multidimensional protein identification technology in the cofractionating organelles were subtracted from the NE data set. In addition to all 13 known NE integral proteins, 67 uncharacterized open reading frames with predicted membrane-spanning regions were identified. All of the eight proteins tested targeted to the NE, indicating that there are substantially more integral proteins of the NE than previously thought. Furthermore, 23 of these mapped within chromosome regions linked to a variety of dystrophies.
Article
Motivation: The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). Results: We considered 12 subcellular locations in eukaryotic cells: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracellular medium, Golgi apparatus, lysosome, mitochondrion, nucleus, peroxisome, plasma membrane, and vacuole. We constructed a data set of proteins with known locations from the SWISS-PROT database. A set of SVMs was trained to predict the subcellular location of a given protein based on its amino acid, amino acid pair, and gapped amino acid pair compositions. The predictors based on these different compositions were then combined using a voting scheme. Results obtained through 5-fold cross-validation tests showed an improvement in prediction accuracy over the algorithm based on the amino acid composition only. This prediction method is available via the Internet.
Article
The recent finding that the human genome comprises between 21000 and 39000 genes, a number much lower than expected, has in no way simplified the complexity associated with the understanding of how cells perform their functions. Elucidation of the molecular mechanisms underlying cell functions will require a global knowledge of the expressed proteins, including splice variant products, their post-translational modifications, their subcellular localizations and their assembly into molecular machines as deduced from protein-protein interactions, at any given time during the life of the cell or under any cellular conditions. Current and expected advances in mass spectrometry and bioinformatics might help the realization of these goals in a shorter time than is currently predicted.
Article
Mitochondria are tailored to meet the metabolic and signaling needs of each cell. To explore its molecular composition, we performed a proteomic survey of mitochondria from mouse brain, heart, kidney, and liver and combined the results with existing gene annotations to produce a list of 591 mitochondrial proteins, including 163 proteins not previously associated with this organelle. The protein expression data were largely concordant with large-scale surveys of RNA abundance and both measures indicate tissue-specific differences in organelle composition. RNA expression profiles across tissues revealed networks of mitochondrial genes that share functional and regulatory mechanisms. We also determined a larger "neighborhood" of genes whose expression is closely correlated to the mitochondrial genes. The combined analysis identifies specific genes of biological interest, such as candidates for mtDNA repair enzymes, offers new insights into the biogenesis and ancestry of mammalian mitochondria, and provides a framework for understanding the organelle's contribution to human disease.
Article
Open source software encourages innovation by allowing users to extend the functionality of existing applications. Treeview is a popular application for the visualization of microarray data, but is closed-source and platform-specific, which limits both its current utility and suitability as a platform for further development. Java Treeview is an open-source, cross-platform rewrite that handles very large datasets well, and supports extensions to the file format that allow the results of additional analysis to be visualized and compared. The combination of a general file format and open source makes Java Treeview an attractive choice for solving a class of visualization problems. An applet version is also available that can be used on any website with no special server-side setup. Availability: http://jtreeview.sourceforge.net under GPL.
Article
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions has increased notably for all three organism groups, eukaryotes, Gram-negative and Gram-positive bacteria. The accuracy of cleavage site prediction has increased in the range 6-17% over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false-positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has been benchmarked against other available methods. Predictions can be made at the publicly available web server
Article
Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.
Article
We describe the application of a microarray platform, which combines information from exon body and splice-junction probes, to perform a quantitative analysis of tissue-specific alternative splicing (AS) for thousands of exons in mammalian cells. Through this system, we have analyzed global features of AS in major mouse tissues. The results provide numerous inferences for the functions of tissue-specific AS, insights into how the evolutionary history of exons can impact on their inclusion levels, and also information on how global regulatory properties of AS define tissue type. Like global transcription profiles, global AS profiles reflect tissue identity. Interestingly, we find that transcription and AS act independently on different sets of genes in order to define tissue-specific expression profiles. These results demonstrate the utility of our quantitative microarray platform and data for revealing important global regulatory features of AS.