Fig 7 - available from: BMC Genomics
This content is subject to copyright. Terms and conditions apply.
CbioPortal – harvest integration architecture. The integrated query tool allows for a back and forth search between genes of interest and visualization in CBioPortal, and the tool for phenotype and specimen requests. We perform this integration in a similar fashion to the other tools in the tool-kit. We utilize a combination of constructing web endpoints and traditional ETL. The cancer genomics integration starts with a scripted pull of mutation data via a secure database connection utilizing elements of CBioPortal's relational data structure to store this large set of data. Specimens known to the repository are loaded into the CBioPortal by the bioinformatics team with known specimen identifiers from the LIMS. This creates a natural link between any granular genomic data, the specimen and ultimately the subject. URLs are constructed in the query platform that allow for researchers to move from clinical and specimen driven queries directly to CBioPortal to visualize mutation data of interest  

CbioPortal – harvest integration architecture. The integrated query tool allows for a back and forth search between genes of interest and visualization in CBioPortal, and the tool for phenotype and specimen requests. We perform this integration in a similar fashion to the other tools in the tool-kit. We utilize a combination of constructing web endpoints and traditional ETL. The cancer genomics integration starts with a scripted pull of mutation data via a secure database connection utilizing elements of CBioPortal's relational data structure to store this large set of data. Specimens known to the repository are loaded into the CBioPortal by the bioinformatics team with known specimen identifiers from the LIMS. This creates a natural link between any granular genomic data, the specimen and ultimately the subject. URLs are constructed in the query platform that allow for researchers to move from clinical and specimen driven queries directly to CBioPortal to visualize mutation data of interest  

Source publication
Article
Full-text available
Background High throughput molecular sequencing and increased biospecimen variety have introduced significant informatics challenges for research biorepository infrastructures. We applied a modular system integration approach to develop an operational biorepository management system. This method enables aggregation of the clinical, specimen and gen...

Citations

... Unfortunately, commercial vendors supporting the clinical workspace do not integrate nor interoperate well with the LIMS systems provided by commercial vendors for biobanking, and this is another developmental area in informatics via open-source systems needed to move precision oncology forward. There is much innovation around the use of the Vanderbilt University Realtime Electronic Data Capture (REDCap) system as a flexible tool for accomplishing these innovations [77][78][79][80]. and store, harmonize, and securely transfer the data to research networks following FAIR principles. ...
Chapter
The past three decades catapulted biorepositories and their informatics innovation to the forefront of Precision Medicine and particularly Precision Oncology. The National Institute of Health (NIH)‘s programs, particularly the All of Us Research Program and the Biden Cancer Moonshot initiatives, have been key enablers. The importance of biospecimens and their derivatives, particularly genomic sequencing and expression data coupled with deep clinical annotation from electronic health records, are fueling a new era of deep biologic interrogation of both the cell biology of human tissues and their diseased counterparts. Clear evidence of this is the Chan-Zuckerberg BioHub (CZ BioHub), the Human Biomolecular Atlas (HuBMAP), the Human Tumor Atlas Network (HTAN), the Cellular Senescence Network (SenNet), and international initiatives such as LifeTime. Biorepositories that support spatial biology and cellular microenvironment imaging efforts are fueling deeper understanding of the host microenvironment and how Precision Medicine/Oncology can lead to more effective therapies. The tools of the next-generation biorepository include single cell genomics, whole-slide imaging, and multiplexed analyses to understand cell-cell messaging. Thus, “next-gen” tools are positioned to advance a deeper understanding of therapies that can be used to exploit cellular and molecular interactions within disease tissues. Biorepositories like the National Mesothelioma Virtual Bank (NMVB) are striving to provide an example of what a traditional biorepository can do to become a “Next-Generation” biorepository that not only provides clinical samples with extensive clinical annotations but also enables access to genomic, imaging, and informatics data and supports experimental innovation. This chapter hopes to establish a vision for biorepository informatics fueled by innovative approaches that enable efficient, cost-effective, and sustainable models for advancing biomedical discovery and clinical translation.KeywordsBest practicesBiobankBioHubBiorepositoryBiospecimenCancer vaccinesComputational biologyEHRs (Electronic Health Records)GenomicsHTANHuBMAPImagingImmunotherapyInformaticsMultiplexed imagingNational Mesothelioma Virtual BankREDCapSenNetSingle-cell genomicsSocial determinants of healthSpatial biologySpatio-temporal analysisStandardsSystems biologyWhole slide imaging
... Replication was performed in a separate germline genetic dataset (n cases = 270; 210 low-grade and 54 high-grade gliomas, n controls = 2,080) including cases from the Children's Brain Tumor Network (CBTN) study, the Gabriella Miller Kids First study (BASIC3), the Pacific Pediatric Neuro-Oncology Consortium (PNOC) 36 and controls from the GICC Study 24 . A formal replication p-value of 0.05 was used for the single detected genome-wide significant locus. ...
Article
Full-text available
Background: While recent sequencing studies have revealed that 10% of childhood gliomas are caused by rare germline mutations, the role of common variants is undetermined and no genome-wide significant risk loci for pediatric CNS tumors have been identified to date. Methods: Meta-analysis of three population-based genome-wide association studies (GWASs) comprising 4,069 children with glioma and 8,778 controls of multiple genetic ancestries. Replication was performed in a separate case-control cohort. Quantitative trait loci analyses and a transcriptome-wide association study were conducted to assess possible links with brain tissue expression across 18,628 genes. Results: Common variants in CDKN2B-AS1 at 9p21.3 were significantly associated with astrocytoma, the most common subtype of glioma in children (rs573687, p-value 6.974e-10, OR 1.273, CI95 1.179-1.374). The association was driven by low-grade astrocytoma (p-value 3.815e-9) and exhibited unidirectional effects across all six genetic ancestries. For glioma overall, the association approached genome-wide significance (rs3731239, p-value 5.411e-8), while no significant association was observed for high-grade tumors. Predicted decreased brain tissue expression of CDKN2B was significantly associated with astrocytoma (p-value 8.090e-8). Conclusion: In this population-based GWAS meta-analysis, we identify and replicate 9p21.3 (CDKN2B-AS1) as a risk locus for childhood astrocytoma, thereby establishing the first genome-wide significant evidence of common variant predisposition in pediatric neuro-oncology. We furthermore provide a functional basis for the association by showing a possible link to decreased brain tissue CDKN2B expression and substantiate that genetic susceptibility differs between low- and high-grade astrocytoma.
... Authors in [18] proposed a common data model to serve as data hub for data brokering processes, which improves data accessibility and availability. Works such as [19] that aim to improve data protection automate the data de-identification process and centralize the IRB request evaluation. While above works expedite the data brokering function, all of them however require human intervention to evaluate requests and manually approve or deny them. ...
Article
Full-text available
Healthcare innovations are increasingly becoming reliant on high variety and standards-compliant (e.g., HIPAA, common data model) distributed data sets that enable predictive analytics. Consequently, health information systems need to be developed using cooperation and distributed trust principles to allow protected data sharing between multiple domains or entities (e.g., health data service providers, hospitals and research labs). In this paper, we present a novel health information sharing system viz., HonestChain that uses Blockchain technology to allow organizations to have incentive-based and trustworthy cooperation to either access or provide protected healthcare records. More specifically, we use a consortium Blockchain approach coupled with chatbot guided interfaces that allow data requesters to: (a) comply with data access standards, and (b) allow them to gain reputation in a consortium. We also propose a reputation scheme for creation and sustenance of the consortium with peers using Requester Reputation and Provider Reputation metrics. We evaluate HonestChain using Hyperledger Composer in a realistic simulation testbed on a public cloud infrastructure. Our results show that our HonestChain performs better than the state-of-the-art requester reputation schemes for data request handling, while choosing the most appropriate provider peers. We particularly show that HonestChain achieves a better tradeoff in metrics such as service time and request resubmission rate. Additionally, we also demonstrate the scalability of our consortium platform in terms of the Blockchain transaction times.
... Finally, 1 article was replaced by a more updated publication [265]. These 247 articles were combined with the targeted web-based search [32, [39][40][41][266][267][268][269]; hence, we identified a total of 255 articles ( Figure 1). The most frequent words in the articles were system,information, study, project, and design (Multimedia Appendix 1: A4. , which was developed to automatically load data from a clinical data repository into a standard data model that researchers can query; it is a successful example of fast data upload and query using data structures designed from standard data models available for clinical research. ...
... The main feature is that the model allows the biosample operational user to access the raw and identified biobank data source for quality control and biosample management. An example of a biobank-driven structure is the biorepository portal (BRP) [41,266], which allows for the automatic integration of biosamples with clinical data, while maintaining unrestricted access to the biorepository for the operational team. The Mayo Clinic and Vanderbilt University adopt the general and biobank-driven architecture models in parallel. ...
... The data processing involved in extraction, transformation, and loading (ETL) is described in detail in the articles of biomedical translational research information system (BTRIS) [14], HaMSTR [24], Mayo Translational Research Center (TRC) [38], CARPEM [28], onco-i2b2, Vanderbilt's Synthetic Derivative [39] and BioVU [40], and BRP [41]. These IDRs represent the general and biobank-driven architecture models, which implement a staging layer for the ETL process. ...
Article
Full-text available
Background: Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective: The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods: We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results: Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions: IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.
... SYBR FAST Universal qPCR Master Mix with low Rox (Roche Sequencing, cat# KK4602) was used and the reaction was performed with a QuantStudio 7 Flex (Applied Biosystems). Four technical replicates of each sample-primer combination were performed and used to determine the standard error.Bulk RNA-seq data processingDe-identified raw RNA-seq data of 40 pediatric ependymal tumors located in the posterior fossa were provided by the CBTTC biorepository(Felmeister et al., 2016) (CBTTC Approved DataProject 19). Gene expression was quantified for each tumor using Kallisto (version 0.45.1)(Bray ...
Preprint
Full-text available
Pediatric ependymoma is a devastating brain cancer marked by its relapsing pattern and lack of effective chemotherapies. This shortage of treatments is partially due to limited knowledge about ependymoma tumorigenic mechanisms. Although there is evidence that ependymoma originates in radial glia, the specific pathways underlying the progression and metastasis of these tumors are unknown. By means of single-cell transcriptomics, immunofluorescence, and in situ hybridization, we show that the expression profile of tumor cells from pediatric ependymomas in the posterior fossa is consistent with an origin in LGR5 ⁺ stem cells. Tumor stem cells recapitulate the developmental lineages of radial glia in neurogenic niches, promote an inflammatory microenvironment in cooperation with microglia, and upon metastatic progression initiate a mesenchymal program driven by reactive gliosis and hypoxia-related genes. Our results uncover the cell ecosystem of pediatric posterior fossa ependymoma and identify WNT/β-catenin and TGF-β signaling as major drivers of tumorigenesis for this cancer.
... Web-based platforms to manage biorepositories are a prolific business with a multitude of user friendly inventory and sample tracking software already available for purchase [164,165]. There would not be any need to establish a web-site or program from scratch rather, we suggest that current available softwares be explored such as ZIMS 360© (for which a biorepository component to their already globally established zoo and aquarium managerial server is being developed and is currently being beta tested (personal communication, Mary Ellen, ZIMS 360©)). ...
Article
The Amphibian Conservation Action Plan (ACAP), published in 2007, is a formal document of international significance that proposed eleven relevant actions for global amphibian conservation. Action seven of the ACAP document addresses the use of amphibian captive programs as a conservation tool. Appendix material under this action explores the potential use of Genome Resource Banking (biobanking) as an urgently needed tool for these captive programs. ACAP proposed twelve objectives for Genome Resource Banking which exhibit little emphasis on reproduction as a vital underlying science for amphibian Captive Breeding Programs (CBP's). Here we have reassessed the original twelve ACAP objectives for amphibian reproduction and biobanking for CBP's as a contribution to future ACAP review processes. We have reviewed recent advances since the original objectives, as well as highlighted weaknesses and strengths for each of these objectives. We make various scientific, policy and economic recommendations based on the current reality and recent advances in relevant science in order to inform future ACAP towards new global objectives. The number of amphibian CBP'S has escalated in recent years and reproductive success is not always easily accomplished. Increases in applied and fundamental research on the natural history and reproductive biology of these species, followed by the appropriate development and application of artificial reproductive technologies (ART's) and the incorporation of genome resource banks (GRB's), may turn CBP's into a more powerful tool for amphibian conservation.
... A research proposal request for biospecimens and/or data (clinical, imaging, histology slides, genomics) goes through a peer review process to approve specimen distribution and data sharing. 5 After approval, the specimens are delivered to investigators while CBTTC data is available for viewing and download from the Gabriella Miller Kids First Data Resource Center (KF-DRC, . CC-BY 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. ...
Preprint
Full-text available
Background: Pediatric high grade glioma (pHGG) remains a fatal disease. Increased access to richly annotated biospecimens and patient derived tumor models will accelerate pHGG research and support translation of research discoveries. This work describes the pediatric high grade glioma set of the Children's Brain Tumor Tissue Consortium (CBTTC) from the first release (October 2018) of the Pediatric Brain Tumor Atlas (PBTA). Methods: pHGG tumors with associated clinical data and imaging were prospectively collected through the CBTTC and analyzed as the Pediatric Brain Tumor Atlas (PBTA) with processed genomic data deposited into PedcBioPortal for broad access and visualization. Matched tumor was cultured to create high grade glioma cell lines analyzed by targeted and WGS and RNA-seq. A tissue microarray (TMA) of primary pHGG tumors was also created. Results: The pHGG set included 87 collection events (73 patients, 60% at diagnosis, median age of 9 yrs, 55% female, 46% hemispheric). Analysis of somatic mutations and copy number alterations of known glioma genes were of expected distribution (36% H3.3, 47% TP53, 24% ATRX and 7% BRAF V600E variants). A pHGG TMA (n=77), includes 36 (53%) patient tumors with matched sequencing. At least one established glioma cell line was generated from 23 patients (32%). Unique reagents include those derived from a H3.3 G34R glioma and from tumors with mismatch repair deficiency. Conclusion: The CBTTC and PBTA have created an openly available integrated resource of over 2,000 tumors, including a rich set of pHGG primary tumors, corresponding cell lines and archival fixed tissue to advance translational research for pHGG.
... Another important aspect of basic research is the identification of tumor biomarkers, which can be used for early cancer detection, diagnosis and prognosis. Cancer biomarkes are usually studied using body liquid biopsies or tissue samples deposited in biobanks 2,7,8 . One of these biobanks has been developed by the NCI Tissue Array Research Program (TARP). ...
... Another important aspect of basic research is the identification of tumor biomarkers, which can be used for early cancer detection, diagnosis and prognosis. Cancer biomarkes are usually studied using body liquid biopsies or tissue samples deposited in biobanks 2,7,8 . One of these biobanks has been developed by the NCI Tissue Array Research Program (TARP). ...
Article
Full-text available
Over the past decades, consistent studies have shown that race/ethnicity have a great impact on cancer incidence, survival, drug response, molecular pathways and epigenetics. Despite the influence of race/ethnicity in cancer outcomes and its impact in health care quality, a comprehensive understanding of racial/ethnic inclusion in oncological research has never been addressed. We therefore explored the racial/ethnic composition of samples/individuals included in fundamental (patient-derived oncological models, biobanks and genomics) and applied cancer research studies (clinical trials). Regarding patient-derived oncological models (n = 794), 48.3% have no records on their donor's race/ethnicity, the rest were isolated from White (37.5%), Asian (10%), African American (3.8%) and Hispanic (0.4%) donors. Biobanks (n = 8,293) hold specimens from unknown (24.56%), White (59.03%), African American (11.05%), Asian (4.12%) and other individuals (1.24%). Genomic projects (n = 6,765,447) include samples from unknown (0.6%), White (91.1%), Asian (5.6%), African American (1.7%), Hispanic (0.5%) and other populations (0.5%). Concerning clinical trials (n = 89,212), no racial/ethnic registries were found in 66.95% of participants, and records were mainly obtained from Whites (25.94%), Asians (4.97%), African Americans (1.08%), Hispanics (0.16%) and other minorities (0.9%). Thus, two tendencies were observed across oncological studies: lack of racial/ethnic information and overrepresentation of Caucasian/White samples/individuals. These results clearly indicate a need to diversify oncological studies to other populations along with novel strategies to enhanced race/ethnicity data recording and reporting.
... They have shown that storage temperature affected metabolite concentrations only little, while the number of temperature change cycles has a strong effect on metabolites stability [110]. Biobanking is considered as a critical component of precision medicine workflow and is being incorporated into several PMIs [112][113][114]. For example, MOBIT project adopted the oncology biobanking procedures developed by a commercial biotech company Indivumed GmbH (Hamburg, Germany) [22]. ...
... The workflow incorporates several steps that are important for metabolomics analysis and consists of following major steps: patient qualification into biobank, anonymization of patient data, collection of biospesimens and patient data, preparation and storage of samples into biobank and quality control of tissues and biofluids. Another example of successful biorepository infrastructure is an open-source biorepository management system developed by Felmeister and coworkers [114]. The system consists of electronic Honest Broker (eHB) and Biorepository Portal (BRP) that, in tandem, allow for integration of clinical, specimen and genomic data collected for biorepository resources while protecting patient privacy [114]. ...
... Another example of successful biorepository infrastructure is an open-source biorepository management system developed by Felmeister and coworkers [114]. The system consists of electronic Honest Broker (eHB) and Biorepository Portal (BRP) that, in tandem, allow for integration of clinical, specimen and genomic data collected for biorepository resources while protecting patient privacy [114]. As of January 2016, eight institutions were participating in biobanking activities using this tool kit with over 4000 unique subject records deposited in the eHB and over 30 000 specimens accessioned [114]. ...
Article
Precision medicine is rapidly emerging as a strategy to tailor medical treatment to a small group or even individual patients based on their genetics, environment and lifestyle. Precision medicine relies heavily on developments in systems biology and omics disciplines, including metabolomics. Combination of metabolomics with sophisticated bioinformatics analysis and mathematical modeling has an extreme power to provide a metabolic snapshot of the patient over the course of disease and treatment or classifying patients into subpopulations and subgroups requiring individual medical treatment. Although a powerful approach, metabolomics have certain limitations in technology and bioinformatics. We will review various aspects of metabolomics technology and bioinformatics, from data generation, bioinformatics analysis, data fusion and mathematical modeling to data management, in the context of precision medicine.