Content uploaded by Alakesh Bera
Author content
All content in this area was uploaded by Alakesh Bera on May 18, 2023
Content may be subject to copyright.
Chapter 19
Integrated In Silico Analysis of Proteogenomic and Drug
Targets for Pancreatic Cancer Survival
Alakesh Bera, Digonto Chatterjee, Jack Hester, and Meera Srivastava
Abstract
Pancreatic cancer remains a major health concern, being among the deadliest forms of cancer with over 80%
of the patients presenting with metastatic disease. According to the American Cancer Society, for all stages
of pancreatic cancer combined, the 5-year survival rate is less than 10%. Genetic research on pancreatic
cancer has generally been focused on familial pancreatic cancer, which is only 10% of all pancreatic cancer
patients. This study focuses on finding genes that impact the survival of pancreatic cancer patients which can
be used as biomarkers and potential targets to develop personalized treatment options. We used cBioPortal
platform using NCI-initiated The Cancer Genome Atlas (TCGA) dataset to find genes that were altered
differently in different ethnic groups which can serve as potential biomarkers and analyzed the genes’
impact on patient survival. MD Anderson Cell Lines Project (MCLP) and genecards.org were also utilized
to identify potential drug candidates that can target the proteins encoded by the genes. The results showed
that there are unique genes that are associated with each race category which may influence the survival
outcomes of patients, and their potential drug candidates were identified.
Key words Proteogenomics, Proteomics, Genomics, Survival outcomes, Drug targets, cBioPortal,
The Cancer Genome Atlas (TCGA), MD Anderson Cell Lines Project (MCLP), Copy number
alterations (CNAs), Socioeconomic status (SES), Disease-free survival (DFS), Overall survival (OS),
Pancreatic cancer
1 Introduction
Pancreatic cancer is among the deadliest forms of cancer. It is the
seventh most common cancer, yet it is the second leading cause of
cancer deaths in the United States [1]. It is estimated that in 2021,
48,220 patients will die from pancreatic cancer. Risk factors for
pancreatic cancer include smoking, diabetes, obesity, chronic pan-
creatitis, and family history [2]. Over 80% of the patients present
with metastatic disease. Despite advances in chemotherapy, the
average survival remains less than 5 years even after surgery [3].
Genetic testing has been primarily focused on familial pancreatic
cancer, which only accounts for ten percent (10%) of all pancreatic
Usha N. Kasid and Robert Clarke (eds.), Cancer Systems and Integrative Biology, Methods in Molecular Biology, vol. 2660,
https://doi.org/10.1007/978-1-0716-3163-8_19,
© The Author(s), under exclusive license to Springer Science+Busi ness Media, LLC, part of Springer Nature 2023
273
cancers. A study on non-familial pancreatic cancer patients revealed
six genes, namely, CDKN2A, TP53, MLH1, BRCA2, ATM, and
BRCA1, with significant associations between pancreatic cancer
and mutations in these genes [4]. A previous study has shown
that patient survival was associated with mutations in KRAS,
CDKN2A, SMAD4, and TP53 [5].
274 Alakesh Bera et al.
Similar to other common malignancies, pancreatic cancer is
associated with disparities by socioeconomic status (SES), ethnic
minority status, and insurance [6, 7]. In contrast to other types of
cancer (breast, colon) where screening can detect early-stage dis-
ease, no screening modality exists for pancreatic cancer. Thus,
disparities in outcomes for pancreatic cancer do not result from
lack of screening [8]. There are currently limited data on the
genetic susceptibility of pancreatic cancer survival based on race
or ancestral origin. The association of driver gene alterations based
on racial category and their association with patient outcomes has
not been clearly established. Therefore, we per formed a quantita-
tive genomic analysis to find genes associated with patient survival
for the pancreatic cancer patients based on their racial categories. In
this current study, we tried to establish a method to identify the
genetic basis of racial disparities in cancer survival. We tried to
determine whether White, African American/Black, and Asian
patients have different gene alterations and whether or not they
can be diagnosed and managed based on their specific signature
profile. Identification, prevention, and management of factors
based on race may help find effective strategies for clinical manage-
ment of pancreatic cancer.
2 Clinical Data from cBioPortal
cBioPortal was utilized to access and analyze the public database on
cancer tissues generated by the TCGA project (https://www.
cancer.gov/tcga). The Cancer Genome Atlas (TCGA) is an
NCI-initiated and established cancer genomic database which is
publicly available. There are over thirty thousand tumor samples
that have already been sequenced from over twenty different types
of cancer. The cBioPortal allows users to question datasets across
data types including genes and clinical samples, providing an oppor-
tunity to investigate several different biologically and/or clinically
relevant hypotheses. For this study, all datasets for pancreatic cancer
were selected particularly for patient’s survival with genetic signa-
ture. Data also included the specific patient’s number and their
contributions from different studies. Customized datasets were
created from Pancreatic Cancer Studies in cBioPortal along with
age, race, and sex distributions of patients (Fig. 1). Datasets that
did not provide race information were not included in this analysis.
Proteogenomic Analysis of Health Disparities in Pancreatic Cancer 275
Fig. 1 Different studies included in this analysis to study the pancreatic cancer patient’s survival with genetic
signature in particular. (a, b) Data also included the specific patient’s number and their contributions from
different studies. (c, d) Customized dataset created from Pancreatic Cancer Studies in cBioPortal. (c) Data
indicated the age distributions of patients. (d) Male vs female ratio
3 Methods
3.1 Copy Number
Alterations
Copy number alteration data was analyzed in each of three racial
categories, (1) White, (2) African American and Black, and
(3) Asian with customized datasets. Within those separate virtual
studies, the genes that were amplified in the most patients in terms
of CNA (copy number alterations) were recorded and compared
among the different virtual datasets.
3.2 Survival Curves Genes that were altered differently in the racial groups were
selected to analyze the genes’ impact on patient survival. Both
overall survival (OS) and disease-free survival (DFS) curves were
computed. This was done by utilizing the inbuilt statistical analysis
tools available within the cBioPortal platform. The data was tested
for significance using a log-rank test to compare the survival dis-
tributions between two samples. This was done automatically using
cBioPortal [9, 10].
276 Alakesh Bera et al.
3.3 Protein Drug
Analysis
MD Anderson Cell Lines Project (MCLP) and genecards.org were
utilized to find protein drug interactions using the genes found in
this study as drug targets [11, 12].
3.4 Pancreatic
Cancer Categories
More than 1200 patient data spanning ten [10] different pancreatic
cancer clinical studies were analyzed. The disease types were (1) aci-
nar cell carcinoma of the pancreas, (2) cystic tumor of the pancreas,
(3) five pancreatic adenocarcinoma studies, and (4) three pancreatic
neuroendocrine tumor studies (Fig. 1). The data were stratified
according to three race categories self-identified by patients:
(1) White, (2) African American or Black, and (3) Asian. There
were only two samples that were identified as Hispanic; therefore
the Hispanic category was not included in our study. A total of
436 patient samples were customized based on three (3) race cate-
gories. The samples were from three pancreatic adenocarcinoma
studies. For our analysis, Black or African American and Black were
taken together. The number of samples for White patients was
394, 22 for Asian, and 19 for Black and African American. There
are several studies pursued to determine the racial disparities in
cancer survival. However, it is not clear whether the disparity is
related to biological differences or consequences of social, eco-
nomic, or cultural environments. Given the challenges associated
with race and ethnicity definition and disease-linked biological
observations in this context, results from our study need to be
independently validated in larger patient population with ancestry
information.
3.5 Data Analysis The analysis began with an investigation of copy number alterations
(CNA) at the genomic DNA level for pancreatic cancer samples.
Copy number alterations (CNAs) refer to changes in the number of
copies of a genetic region caused by deletion or duplication events
in the genome. Recent technological advances have enabled the
identification of CNAs across the entire genome, associating cancer
with the alteration rates of various genes in cancer [8].
3.5.1 Copy Number
Alterations Based on Race
Categories
CBioPortal was utilized, which contains comprehensive geno-
mic and transcriptomic data from various cancer studies
[13]. Focusing on the customized pancreatic cancer datasets cre-
ated based on race categories as mentioned in the above section,
genes with the highest frequency of copy number alterations
(CNA) in each race category were determined. A few unique
genes specific to each race category that had copy number amplifi-
cations were found (Table 1). The altered genes’ locations are in
different cytobands for each race category. As previously reported,
Race Cytoband
CDKN2A was deleted in 30% of patients in every race category [5],
confirming our data analysis approach. Most genes that had deep
deletions in copy number were common among the different race
categories and therefore were not included in further investigation
and analysis.
Proteogenomic Analysis of Health Disparities in Pancreatic Cancer 277
Table 1
Genes with copy number alterations unique to each race category
Genes with copy number alterations
(AMP)
Alteration frequency
(%)
White GATA6, RECQL4, MIB1 18q11.2 (GATA5, MIB1)
8q24.3 (RECQL4)
14.8 (GATA6)
14.0(RECQL4)
12.8%(MIB1)
Black PKDL1, GARS1, NEUROD3 7p12.3(PKDL1)
7p14.3(GA RS1,
NEUROD3)
28.6
Asian FGFR3, ABCA11P, UVSSA (etc.) 4p16.3 27.3
3.5.2 Association of CNA
Amplification with Survival
Outcome
An evaluation was carried out between the association of the gene
alterations unique to each category and most frequently amplified
in each race category with survival outcome. The most frequently
altered genes in each category were found to have a direct associa-
tion with patient survival. Kaplan–Meyer curves for overall survival
(OS) and disease-free survival (DFS) were generated using the
cBioPortal platform for each gene in the specific race categories.
P-values were generated by cBioPortal using the log-rank test for
survival between 2 samples. This was compared to a significance
level of 0.05. The survival was calculated in median months, and
the 95% confidence interval was given as well.
Figure 2 shows representative Kaplan–Meier curves for disease-
free survival (DFS) outcomes associated with alterations of genes in
patients from different race categories. The P-values are 0.0031 for
White patients, 0.0516 for Asian patients, and 0.8680 for African
American and Black patients.
The results showed that there are unique genes that are asso-
ciated with each race category which have an effect on the survival
outcomes of patients. For patients that had expression of these
genes, the median month of survival was less compared to those
who did not have amplified expression.
White Patients For the genes amplified in the White race category,
the effect on OS was minimal while the DFS was significantly worse
than in patients where there was no amplification with a p-value of
0.0031. The genes amplified in the White race category were from
the cytoband 18p11.2 (GATA6, MIB1) and 8q24.3 (RECQL4). It
is notable that the genes that were from the same cytoband showed
nearly identical data.
Genes altered
278 Alakesh Bera et al.
Fig. 2 Kaplan–Meyer curves for disease-free survival in each race category. Statistical values (P-value) are
included in the figure
Table 2
Survival outcomes (in months) associated with altered genes in each race category
Race
category
Overall
Survival
(OS) unaltered
Overall
Survival
(OS) altered
Disease Free
Survival (DFS)-
unaltered
Disease Free
Survival (DFS)-
altered
GATA6 20.17 20.35 20.4 12.43
White RECQL4 20.35 15.11 20.37 9.57
MIB1 20.19 20.34 23.52 12.42
African
American
& black
PKDIL1.
GARS1,
NEUROD6
17.03 2.01 NA NA
Asian FGFR3.
ABCA11P,
UVSSA. (etc.)
66.89 11.61 12.32 49.68
Asian Patients For genes amplified in the Asian race category, the
survival outcomes for patients with amplified genes was signifi-
cantly worse for both overall survival and disease-free survival com-
pared to patients who did not have amplified genes with a p-value of
0.0516. The genes in the Asian race category were all from the same
cytoband (4p16.3), and while only 3 were listed in Table 2, there
were 39 genes from the cytoband which were all amplified and
showed the same survival outcomes.
African American and Black Patients For genes amplified in the
African American and Black race category, the effect on overall
survival for patients that had amplified genes in the 7p cytoband
had significantly poor outcomes (2 months) with alterations in the
genes PKD1L1, GARS1, and NEUROD6 compared to patients
with no amplification (17 months). There was no data for disease-
Drug candidates
free survival due to the limited number of samples. However, it is
notable to mention that data for the African American and Black
categories were limited (African American/Black, n = 19) and were
not statistically significant.
Proteogenomic Analysis of Health Disparities in Pancreatic Cancer 279
Table 3
Role of the genes in cancers and as potential drug candidates
Genes/race
categories
Role in
pancreatic
cancer
Role in other cancers (prognostic
value)
GATA6
(White)
No Renal (unfavorable) (detected in
many)
Spautin.l; parbendazole
RECQL4
(White)
No Liver (unfavorable) (detected in all) Rec 15/2615 dihydrochloride
MIB1 (White) No None (detected in all) SIB 1893
PKD1L1
(Black)
No None (detected in many) JKC 363; HS 014
GARS1
(Black)
NA NA MRS 1220; RS 100329
hydrochloride
NEUROD6
(Black)
No None (not detected) SB 224289 hydrochloride;
Pancuronium dibromide
FGFR3
(Asian)
No Endometrial (unfavorable)
(detected in many)
PD 173074
ABCA11P
(Asian)
NA NA Bobcat339, TBCA
UVSSA
(Asian)
No Renal (unfavorable) urothelial
(favorable) detected in all
Iressa, ASB 14780
3.5.3 Protein Expression
and Protein–Drug
Interaction
Using the Protein Atlas, protein–drug interaction from MD Ander-
son Cancer-Cell Lines Project (MCLP) dataset [11, 12] and
genecards.org data, an analysis of the role of the genes in cancers
that are the focus of our study was done, and drug candidates that
can target the proteins encoded by the genes were determined.
Potential drug candidates that can be further tested in clinical trials
to provide gene and race specific therapeutic options for pancreatic
cancer patients were proposed. Table 3 shows the role of the genes
in cancers, their role as prognostic markers, and drugs that can be
used to target these genes. The top-ranked 2 drugs targeting each
gene were included.
280 Alakesh Bera et al.
4 Notes
1. This study demonstrates that there are unique sets of genes
commonly associated with self-identified race.
2. Alterations in these genes are associated with patient outcomes
in patients with pancreatic cancer.
3. These data if validated in large population with ancestry infor-
mation may serve as potential biomarkers for specific race
groups to predict patient outcomes.
4. Additionally, several potential drug candidates proposed in our
study targeting the specific proteins encoded by the genes
specific to the individual race categories can be further studied
in clinical settings to develop more personalized race-based
therapeutic options.
5. Besides, understanding the molecular events and mechanisms
that determine patient outcomes has the potential to develop
new and improved treatment approaches for patients with
pancreatic cancer.
6. These results are correlated by other studies, as GATA6 is now
a known oncogene [14], meaning it has the potential to cause
cancer. Similarly, studies have shown copy number alterations
in the cytoband 8q24.3 having a role in cancer [15–17].
7. The role of genes and their respective cytobands from the Black
and African American as well as Asian race category was more
novel. Of the genes found to be amplified in these two race
categories, only FGFR3 is known to be an oncogene [18].
8. It is notable that the cytoband 4p16.3 which was amplified
frequently in the Asian race category, showing 39 genes all
with frequent alteration, causes Wolf–Hirschhorn syndrome
when deleted [19]; however, there has been little investigation
on its amplification and no study on its unique amplification in
the Asian race category or pancreatic cancer.
9. The biggest shortcoming was that the sample size was low,
especially for the Black and African American and Asian race
categories. Despite this, the results can be used as a good
starting point for further, more robust investigation.
10. Future goals include developing new and improved treatment
for patients with pancreatic cancer, potentially by targeting the
genes that were noted in this study.
Proteogenomic Analysis of Health Disparities in Pancreatic Cancer 281
11. Another potential goal would be to do statistical modeling and
larger population study to determine if these genes can be used
for diagnosis of pancreatic cancer, which would be especially
noteworthy as pancreatic cancer is known to be hard to diag-
nose early.
12. Furthermore, the specific role of each cytoband that was largely
amplified and its association with pancreatic cancer can be
investigated.
Acknowledgements
Alakesh Bera and Digonto Chatterjee contributed as co-equal first
authors on this study.
Disclaimer The opinions or assertions contained herein are the
private ones of the authors and are not to be construed as official or
reflecting the views of the Depar tment of Defense, the Uniformed
Services University of the Health Sciences, or any other agency of
the U.S. Government.
References
1. Siegel RL, Miller KD (2019) Jemal A. Cancer
statistics. CA Cancer J Clin 69(1):7–34.
https://doi.org/10.3322/caac.21551
2. Ryan DP, Hong TS, Bardeesy N (2014) Pan-
creatic adenocarcinoma. N Engl J Med
371(22):2140–2141. https://doi.org/10.
1056/NEJMc1412266
3. Noel M, Fiscella K (2019) Disparities in pan-
creatic cancer treatment and outcomes. Health
Equity 3(1):532–540. https://doi.org/10.
1089/heq.2019.0057
4. Hu C, Hart SN, Polley EC et al (2018) Associ-
ation between inherited germline mutations in
cancer predisposition genes and risk of pancre-
atic cancer. JAMA 319(23):2401–2409.
https://doi.org/10.1001/jama.2018.6228
5. Qian ZR, Rubinson DA, Nowak JA et al
(2018) Association of alterations in main driver
genes with outcomes of patients with resected
pancreatic ductal adenocarcinoma. JAMA
Oncol 4(3):e173420. https://doi.org/10.
1001/jamaoncol.2017.3420
6. Khawja SN, Mohammed S, Silberfein EJ et al
(2015) Pancreatic cancer disparities in African
Americans. Pancreas 44(4):522–527. https://
doi.org/10.1097/MPA.0000000000000323
7. Shapiro M, Chen Q, Huang Q et al (2016)
Associations of socioeconomic variables with
resection, stage, and survival in patients with
early-stage pancreatic cancer. JAMA Surg
151(4):338–345. https://doi.org/10.1001/
jamasurg.2015.4239.
8. Zhang Q, Zeng L, Chen Y et al (2016) Pancre-
atic cancer epidemiology, detection, and man-
agement. Gastroenterol Res Pract 8962321.
https://doi.org/10.1155/2016/8962321
9. Cerami E, Gao J, Dogrusoz U et al (2012) The
cBio cancer genomics portal: an open platform
for exploring multidimensional cancer geno-
mics data. Cancer Discov 2(5):401–404.
https://doi.org/10.1158/2159-8290.CD-
12-0095
10. Gao J, Aksoy BA, Dogrusoz U et al (2013)
Integrative analysis of complex cancer geno-
mics and clinical profiles using the cBioPortal.
Sci Signal 6(269):11. https://doi.org/10.
1126/scisignal.2004088
11. Chen MM, Li J, Mills GB, Liang H (2020)
Predicting cancer cell line dependencies from
the protein expression data of reverse-phase
protein arrays. JCO Clin Cancer Inform 4:
357–366. https://doi.org/10.1200/CCI.19.
00144
12. Li J, Zhao W, Akbani R et al (2017) Character-
ization of human cancer cell lines by reverse-
phase protein arrays. Cancer Cell 31(2):
282 Alakesh Bera et al.
225–239. https://doi.org/10.1016/j.ccell.
2017.01.005
13. Bera A, Subramanian M, Karaian J et al (2020)
Functional role of vitronectin in breast cancer.
PLoS One 15(11):e0242141. https://doi.
org/10.1371/journal.pone.0242141
14. Deng X, Jiang P, Chen J et al (2020) GATA6
promotes epithelial-mesenchymal transition
and metastasis through MUC1/beta-catenin
pathway in cholangiocarcinoma. Cell Death
Dis 11(10):860. https://doi.org/10.1038/
s41419-020-03070-z
15. Bera A, Radhakrishnan S, Russ E et al (2022)
Functional role of long non-coding RNA
YTHDF3-AS1 in breast cancer. Cancer Res
82(12_Suppl):5828
16. Bera A, Karaian J, Subramanian M et al (2020)
EXOSC4, a novel gene at chromosome 8q24
loci is linked with breast cancer progression and
is a prognostic marker for breast cancer sur-
vival. Cancer Res 80(16_Supplement):4322
17. Brusselaers N, Ekwall K, Durand-Dubief M
(2019) Copy number of 8q24.3 drives HSF1
expression and patient outcome in cancer: an
individual patient data meta-analysis. Hum
Genomics 13(1):54. https://doi.org/10.
1186/s40246-019-0241-3
18. Zingg D, Bhin J, Yemelyanenko J et al (2022)
Truncated FGFR2 is a clinically actionable
oncogene in multiple cancers. Nature 608
(7923):609–617
19. Zollino M, Orteschi D, Ruiter M et al (2014)
Unusual 4p16.3 deletions suggest an addi-
tional chromosome region for the Wolf-
Hirschhorn syndrome-associated seizures dis-
order. Epilepsia 55(6):849–857