ArticlePDF Available

Abstract and Figures

Breast cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. In many patients, microscopic or clinically evident metastases have already occurred by the time the primary tumor is diagnosed. Chemotherapy or hormonal therapy reduces the risk of distant metastasis by one-third, but it is estimated that about 70% to 80% of patients receiving treatment would have survived without it. Therefore, being able to predict breast cancer metastasis can spare a significant number of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Current studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease recurrence. However, most of these studies attempt to develop genetic marker-based prognostic systems to replace the existing clinical criteria, while ignoring the rich information contained in established clinical markers. Clinical markers, such as patient history and laboratory analysis, which are the basis of day-to-day clinical decision support, are often underused to guide the clinical management of cancer in the presence of microarray data. As a result, given the complexity of breast cancer prognosis, we proposed a novel strategy based on synergy network that utilize both clinical and genetic markers to identify the potential hybrid signatures and investigate their interactions which are associated with breast cancer metastasis. In this study, a computational method is performed on publicly available microarray and clinical data. A rigorous experimental protocol is used to estimate the prognostic performance of the hybrid signature and other prognostic approaches. The hybrid signature performs significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus criterion. At 90% sensitivity level, the hybrid signature achieves 77% specificity, as compared to 53% for the 70-gene signature and 43% for the clinical makers. The predicted results also showed a strong dependence of regulator genes that are related to cell death in cell development process. These significant gene regulators are useful to understand cancer biology and in producing new drug design.
Content may be subject to copyright.
Procedia Computer Science 00 (2010) 000–000
Procedia
Computer
Science
www.elsevier.com/locate/procedia
WCIT-2010
Synergy network based inference for breast cancer metastasis
Farzana Kabir Ahmad a*, Safaai Derisb and Mohd. Syazwan Abdullahc
a, cGraduate Department of Computer Science, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia.
bFaculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia.
Abstract
Breast cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. In many patients, microscopic or
clinically evident metastases have already occurred by the time the primary tumor is diagnosed. Chemotherapy or hormonal
therapy reduces the risk of distant metastasis by one-third, but it is estimated that about 70% to 80% of patients receiving
treatment would have survived without it. Therefore, being able to predict breast cancer metastasis can spare a significant number
of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Current
studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease
recurrence. However, most of these studies attempt to develop genetic marker-based prognostic systems to replace the existing
clinical criteria, while ignoring the rich information contained in established clinical markers. Clinical markers, such as patient
history and laboratory analysis, which are the basis of day-to-day clinical decision support, are often underused to guide the
clinical management of cancer in the presence of microarray data. As a result, given the complexity of breast cancer prognosis,
we proposed a novel strategy based on synergy network that utilize both clinical and genetic markers to identify the potential
hybrid signatures and investigate their interactions which are associated with breast cancer metastasis. In this study, a
computational method is performed on publicly available microarray and clinical data. A rigorous experimental protocol is used
to estimate the prognostic perfor mance of the hybrid s ignature and other pr ognosti c appr oaches. The hybr id signature perfor ms
significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus
criterion. At 90% sensitivity level, the hybrid signature achieves 77% specificity, as compared to 53% for the 70-gene signature
and 43% for the clinical makers. The predicted results also showed a strong dependence of regulator genes that are related to cell
death in cell development process. These significant gene regulators are useful to understand cancer biology and in producing
new drug design.
Keywords: Synergy network; Bayesian network; breast cancer metastasis; inference; conditional independence.
1. Introduction
Breast cancer is a leading cause of cancer-related death and among one of the most aggr essive metastasis disease
worldwide. The growing mortality rate, with 410,000 deaths each year has yield more than 1.6% of all women
deaths worldwide [1]. The major clinical problems of breast cancer ar e the recurrence of disseminated disease and
metastatic behavior. In numerous patients, miniature or clinically evident metastases have already occurred by the
time the primary tumor is diagnosed. Although, treatments such as chemotherapy and endocrine therapy could
reduce the risk of distant metastasis by approximately one-third, however it is predicted 80% of patient would have
survived without receiving these treatments. Being prescribed with highly expensive medicines which turn out to be
unnecessary has caused several complications and exacerbates the condition of breast cancer patients. As the results,
the study of tumor progression and breast cancer metastasis has become a great interest in biomedical field.
Procedia Computer Science 3 (2011) 1094–1100
www.elsevier.com/locate/procedia
1877-0509 c
2010 Published by Elsevier Ltd.
doi:10.1016/j.procs.2010.12.178
c
2010 Published by Elsevier Ltd. Open access under CC BY-NC-ND license.
Selection and/or peer-review under responsibility of the Guest Editor.
Open access under CC BY-NC-ND license.
Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000
Despite significant advances in the treatment of primary breast cancer and enormous studies that have been
conducted, the ability to infer the metastatic behavior of tumors remains one of the most clinical challenges in
oncology. The main cause for this setback is the complex interactions in the cancer progression and metastasis
formation. In early days, three commonly used treatment guidelines such as TNM (tumor, lymph nodes and
metastasis) Tumor Staging System, St. Gallen and NIH (National Institute of Health) consensus criteria have been
used to determine the distant metastases. These breast cancer indices are based on clinical markers such as tumor
size, lymph node involvement, patient age and the aggressiveness of the cancer founded on histopathological
parameters. Regardless of the prominent practiced of these indices, it provides inaccurate r esults in predicting
therapy failure with only 10% specificity at 90% sensitivity level. Thus, a more accurate prognostic criterion is
urgently needed to avoid unnecessary treatment in newly diagnosed patients.
Recently, the development of g enetic marker-based prognostic system has become a breakthrough in cancer
progression research and most studies concentr ated their efforts solel y on this approach. Yet, some researchers do
believe th at the application of gen e expression data to infer cancer progression is often overused in the presence of
clinical data [2]. Clinical data which has been used on daily basis has been neglected and the rich information
contained in established clinical markers has been ignored. Given the complexity of breast cancer metastasis, a more
practical and sensible strategy is to incorporate both clinical and genetic markers that may contain complementary
information.
A small number of studies have been conducted to determine the possibility of integrating clinical and genetic
markers to infer breast cancer metastasis [3, 4]. While some of these approaches show a great promise in
incorporating two different markers to infer cancer metastasis, the issue of high dimensionality data has rarely been
discussed. One important characteristic of microarray data is the extremely large amount of data in a very small
sample size. Thus, by integrating two markers to infer cancer metastasis could be computationally complex as large
number of variables may need to be examined. In this paper, we seek to improve the ability to infer breast cancer
metastasis using a novel strategy known as synergy network. This method is solely based on Bayesian network that
apply two different approaches: an information-theoretic approach and conditional independence approach. Our
keen interest is to obtain correctly learnt network in order to examine the two markers, clinical and genetic markers,
in the pr esence of a third variable which represent the state of the cell (metastasis). In addition, we offered scoring
markers interactions that provide insights into the tumor progression and indicate markers that highly regulate breast
cancer metastasis.
The reminder of this paper is organized as follows. Section 2 describes the method used to develop synergy
network based on Ba yesian network to integrate two diverse markers. This section also elaborates the approaches
taken to implement correct learnt structure learning in order to address the issue of high dimensional data. The
empirical results and discussion are presented in Section 3, while Section 4 provides concluding remarks.
2. Methods
2.1 Bayesian network
Bayesian n etwork is a probabilistic graphical model in which vertices represent random variables an d the absence
of an edge between two vertices represents con ditional independence. Consider a finite set Vn= {X1, X2,…, Xn} of
random variables. Bayesian network representation contains two components: a directed acyclic graph (DAG), G =
(Vn, E
G) which vertices correspond to random variables, and conditional probability distributions of the random
variables, given its dependent variables (parents) in G. The joint distribution of these conditional probability
distributions is defined as follows:


ni VX
iin XPaXPXXP |,...,
1(1)
wher e

ii XPaXP | is a set of conditional probabilities for each variables i
Xand
i
XPa is the set of variables
which are the parents of i
X in graph G.
We use an example to illustrate the basic idea of Ba yesian networks. Given a Bayesian network specified in Fig. 1
for 5 genes: X1, X2, X3, X4, and X5, this structure specifies the parents for genes X3, X4, and X5: Pa(X3)={X1, X2},
Pa (X4)={ X1 }, Pa(X5)={ X3}, where Pa(V) represents the parent vertex set for vertex V.
F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100 1095
Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000
Fig. 1: A simple Bayesian network representation that explicates relationships between five genes
In our context, it can be interpreted that when genes in Z are at fixed expressi on levels, expression levels of genes
in X do not give any information on the expression levels of genes in Y and vice versa. Once the structure of G is
specified for a set of genes, we can interpret a directional edge from X to Y in G as a statement that X is the “cause”
of Y, or the expression level of X has an effect on the expression level of Y. Therefore, obtaining the correct
structure of Bayesian network is essential to perform an efficient inference and correctly represent the dependency
relationship.
Determining the optimal network through Bayesian learning structure has been investigated for many decades. In
conjunction with the invention of micr oarray technology, the problem of sear ching the best fit network given the
datasets have become harder. It is due to the fact that analyzing these high-dimensionality data require a large
number of variables to be analyzed, which yield to exponential growth in sear ching space, known as NP hard.
Generally there are two approaches to learn the structure of Bayesian networks from data: th e search and scorin g
methods and dependency analysis methods.In the first approach, the learning problem is viewed as searching for a
structure that best fits the data. Different scoring methods have been applied to determine the fit between the
network structure an d the data, in cluding Bayesian scoring, entropy-based, and minimum descr iption length, among
others. The dependency analysis appr oach, on the other hand tries to discover from data the depen dencies among
variables and then use these dependencies to construct the network structure.Lately, the dependency analysis
approach is discovered to be more efficient than the search and scoring approach for sparse networks (the number of
edges in the graph is relatively small).In attaining our goal, to infer breast cancer metastasis by integrating clinical
and genetic markers, in this paper we proposed a new strategy to find the optimal structure. In the following section
2.2, we introduce the main steps of the proposed method.
2.2 Synergy network based on information-theoretic approach and conditional independence
Clinical markers and gene expression profiles play an important role in determining breast cancer metastasis.
Integrating and analyzing all th is information to discover factors that r egulate cancer progression require network-
based algorithm. In this study, we employed a novel strategy known as synergy network to achieve the objective.
This method is developed solely based on Bayesian network that rely on two structure learning algorithm:
information-theoretic approach and conditional independence. Our method generally has two different features. In
the first feature, the synergy network is implemented using mutual information that measures the cooperative effect
of two variables on the state of a third. The two variables in this case are genes and clinical markers (Gi and Ci), an d
the third state is the binary state variable representing the occurrence of metastasis (M) [M = 1 (Metastasis present)
and M = 0 (Metastasis not present)]. Mutual information is used to decide which interactions (edges) are more
prominent than the others. Mutual information between random variables X and Y is defined as follows:
 
YXHXHYXI |; (2)
where

XH is the entropy of X and

YXH | is conditional entropy of X given Y. By applying the same rule to our
study, we formulaically calculated the integration of genes and clinical markers (Gi and Ci) as:
1096 F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100
Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000
 
MCIMGIMCGI iiii ;;;, (3)
where

MCGI ii ;, is the cooper ative effect of two variables and

MCIMGI ii ;; is the individual effect.
The proposed algorithm starts from a non-conn ected network, whereby there is still no edge involved between
nodes. Then, we calculate the mutual information for two nodes from the network and based on these mutual
information values the edges is ordered. Sequentially, conn ection between nodes is dr awn according to mutual
information ordered value, which also offer the highest scoring interactions values. The edges which has the mutual
information value less than thr eshold (threshold = 0.1) are excluded as candidates of correct edges. We only choose
edges that have higher values (> threshold) to be the correct edge as it contains better probability of conn ection.
In the second feature, we further examined the learning structure of constructed network (obtained from the first
feature) by using conditional independence approach. Two variables, for instance A and B may have different
structures, ABand AB but carrying the identical mutual information values. Thus, to overcome this issue,
in this algorithm, conditional indepen dence is used to search edges that are incorrect in a triangular structure as
depicted in Fig. 2. Conditional independence is defined as follows:

kjkikji XXPXXPXXXP |||, (4)
Hence, once we detected the edge create a triangle loop and hold the same mutual information value, all three edges
included in the triangle will be run based on equation and we used the result of these test to update the network.
Fig. 2: Triangular structure of three nodes and two edges
2.3 Dataset and Pre-processing
The proposed method is tested and analyzed on van’t Veer et al. [5] dataset, which was obtained from Integrated
Tumor Transcriptome Array and Clinical data Analysis database (ITTACA (2006)). This data set contains
expression profile information derived from 97 lymph n ode negative breast cancer patients, 55 years old or younger
and associated clinical information including age, tumor size, histological grade, angioinvasion, lymphocytic
infiltration, estrogen receptor (ER) an d progesterone receptor (PR) status, which all together form clinical markers.
Prior the implementation of proposed method, the missing values present in this dataset was addressed. Manifold
missing gene expression values is a common problem in microarray dataset. K-nearest neighbors (kNN) imputation
method with k = 10 was used to handle these missing values. The kNN imputation method is utilized as it is the
most robust and sensitive approach to estimate missing values in microarray data set. It is proven to be prominent
and effective method through Troyanskaya et al.’s research [6]. Subsequently, the processed dataset was used as an
input in the proposed method.
3. Results and Discussion
The proposed method was executed on the breast cancer dataset to obtain insights into the cancer development
and how various factors may trigger metastasis progression, producing a synergy net work as shown in Fig 3.
Additionally, the top ten scoring interactions for this particular network ar e given in Table 1. The learn ed network
reveals a group of genes a nd clinical markers which are primarily associated with causing metastasis, M. The larger
nodes in the graph specify th e genes when expressed at different levels lead to a major effect on the status of other
genes (e.g., on or off)/clinical marker, and the light-shaded nodes denote highly regulated genes. Four genes that are
found to regulate the expression levels of other genes are: BBC3, GNAZ, TSPY-like5 (TSPY5), and DCK. Two
genes are highly regulated: FLJ11354 and CCNE2. Meanwhile, angioinvasion has been identified as strong factor in
causing breast cancer metastasis. This network involved 50 genes and 6 clinical mar kers which are closely
associated with breast cancer metastasis, M.
F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100 1097
Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000
Fig. 3: S ynergy ne twork for breast ca ncer meta stasis
The constr ucted network indicates that the BBC3 gen e has a prominent role in regulating oth ers genes. Eight
genes are correlated with BBC3. The BBC3 gene, also known as PUMA is activated by the tumor suppressor p53,
which is a key regulator of apoptosis and tumorigenesis in breast cancer. On the other hand, there is insufficient
information about whether GNAZ could directly regulate the progression of breast cancer, however we discovered
that it has an essential role in cellular processes of the nervous system [7]. Meanwhile, TSPYL5 has been identified
as a genetic marker for breast cancer in several studies [4, 8]. Lastly, DCK is revealed to be associated with
resistan ce to antiviral and anticancer chemotherapeutic agents, therefore this gene is clinically important because of
its relationship to drug resistance and sensitivity. Outside of these regulator genes, two additional highly regulated
genes have been identified in the analysis of our proposed method: FLJ11354 and CCNE2. The FLJ11354 gene was
discovered by Sun et al. [9], while CCNE2 has been reported to qualify as independent prognostic markers for
lymph node–negative breast cancer patients [10].From the clinical markers point of view, angioinvasion is
identified as critical factor that yield to cancer progression compared to other clinical mar kers
We then further evaluated the performance of constructed synergy network using a receiver operating
characteristic (ROC) curve obtained by varying a decision threshold, which can provide a direct view on how this
inference network performs at the different sensitivity and specificity levels. By following the study of van’t Veer
and colleagues [5], a sensitivity is set equal to 90%. The corresponding specificities are computed and reported in
Table 2. For the purpose of comparison, the specificities of the TNM Tumor Staging System, St. Gallen and NIH
consensus criteria ar e also compar ed.
Table 1: The t op ten scorin g interactions. No. Rel indi cates the numb er of relation involved while Pr ed referred t o predictor genes.
No. Rel Pred Target Score
1 Metastasis Contig63649_RC 0.000487
2 Metastasis CCNE2 0.001081
3 UCH37 Contig40831_RC 0.003260
4 BBC3 PRC1 0.006655
5 BBC3 ORC6L 0.014038
1098 F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100
Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000
6 WISP1 COL4A2 0.014892
7 DIAPH3 GNAZ 0.015032
8 MCM6 CCNE2 0.015653
9 DCK Contig55377_RC 0.017289
10 HRASLS FLJ22477 0.017314
Table 2: Inference network at sensitivity of 90%
Methods Specificit y AUC std
NIH 2000 0% 0.61905 0.16234
TNM Staging System 18% 0.71429 0.12747
St. Gallen 43% 0.73810 0.36204
Genetic markers 53% 0.79203 0.03245
Clinical and genetic markers
(hybrid signatures)
77% 0.86438 0.02928
We observed that the St. Gallen criterion significantly outperformed both the TMN staging system and the NIH
2000 consensus, whereas the latter approach (the NIH 2000 consensus) was worse than the TNM staging system.
The St. Gallen criterion achieved a specificity of 43%, while the TNM staging system and the NIH 2000 consensus
obtained a specificity of 18% and 0%, respectively. This result is consistent with previous reports in the literature
[11] whereby the specificity of the St. Gallen criterion outperforms the other clinical indices. On the other hand, the
clinical and genetic marker s (hybrid signatures) improve the specificities of the genetic mar kers and the clinical
markers (St. Gallen criterion) approximately by 20%-30%. We point out that our estimation of the specificity of the
70-gene signature is worse than that reported in [5](43% versus 73%), but is consistent with that in the follow-up
validation done on a larger dataset [12] (53%). Furthermore, we measured the area under curve (AUC) for all five
methods, where th e highest AUC suggesting a better inferen ce network. Therefore, our results have shown that
clinical and genetic marker s improved the specificity of inference n etwork compared to network those based on
genetic and clinical marker alone.
4. Conclusion
Understanding the breast cancer progression network structure reveals the inherent biological information flow
and interactions of various factors which will lead to more effective therapies and disease treatments. In this paper,
we applied computation model which was implemented based on synergy network to study the breast cancer
metastasis using genetic and clinical mar kers. Different genes and clinical markers were found to have high
correlation in causing metastasis. For future work, we intend to validate our discovery by using biological
knowledge. This attempt could arm biologist with information regarding up-stream and down-stream of gene
mechanisms, which further enlighten the interactions in tumor progression.
References
[1] P. Boyle and B. Levin, "World cancer report 2008," International Agency for Resear ch on Cancer, World
Health Organization 2008.
[2] P. Edén, C. Ritz, C. Rose, M. Fernö, and C. Peterson, "“Good Old” clinical markers have similar power in
breast cancer progn osis as microarray gene expression profilers," European Journal of Cancer, vol. 40, pp.
1837-1841, 2004.
[3] O. Geva ert, F. D. Smet, D. Timmerman, Y. Moreau, an d B. D. Moor, "Predicting the progn osis of breast
cancer by integrating clinical and microarray data with Bayesian networks," Bioinformatics, vol. 22, pp.
e184–e190, 2006.
[4] Y. Sun, S. Goodison, J. Li, L. Liu, and W. Farmerie, "Improved breast cancer prognosis through the
combination of clinical and genetic markers," Bioinformatics, vol. 23, pp. 30-37, 2007.
[5] L. J. van't Veer, H. Dai, M. J. van De Vijver, Y. D. He, A. M. Hart, M. Mao, et al., "Gene expression
profilin g predicts clinical outcome of breast cancer," Nature, vol. 415, pp. 530 - 536, 2002.
[6] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, et al., "Missing value
estimation methods for dna microarrays," Bioinformatics, vol. 17, pp. 520–525, 2001.
F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100 1099
Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000
[7] M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa, "KEGG for representation and
analysis of molecular networks in volvin g disea ses and drugs," Nucleic Acids Research, pp. 1-6, 2009.
[8] G. Alexe, S. Alexe, D. E. Axelrod, T. O. Bonates, Lozina, M. Reiss, et al., "Breast cancer prognosis by
combinatorial analysis of gene expression data," Breast Cancer Research, vol. 8, pp. R41, 2006.
[9] Y. Sun, V. Urquidi, and S. Goodison, "Derivation of molecular sign atures for breast cancer recurrence
prediction using a two-way validation approach," in Breast Cancer Research Treatment: Springer
Netherlands, 2009.
[10] A. M. Sieuwerts, M. P. Look, M. E. Gelder, M. Timmermans, A. A. C. Trapman, R. RodriguezGarcia, et
al. "Which Cyclin E prevails as prognostic marker for breast cancer? Results from a retrospective study
involving 635 lymph node negative breast cancer patients," Clinical Cancer Research, vol. 12, pp. 3319-
3328, 2006.
[11] C. Lohrisch, J. Jackson, A. Jones, D. Mates, and I. A. Olivotto, "Relationship between tumor location and
relapse in 6,781 women with early invasive breast cancer," Journal of Clinical Oncology, vol. 18, pp.
2828-2835, 2000.
[12] M. J. van De Vijver, Y. D. He, L. J. van't Veer, H. Dai, A. M. Hart, D. W. Voskuil, and et al., "A gene-
expression signature as a predict of survival in breast can cer," The New England Journal of Medicine, vol.
347, pp. 1999-2009, 2002.
1100 F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100
... The majority of published studies, regarding intelligent systems for cervical cancer support, are concerned about computer aided diagnosis systems based on either cytology or colposcopy image analysis [28][29][30][31]. On the other hand, various papers have been published in the past few years concerning bioinformatics' CDSSs based on ANNs for cancer improved detection, treatment, and follow-up support [32][33][34][35][36][37]. To the best of our knowledge, however, a similar bioinformatics intelligent CDSS for supporting and improving cervical cancer detection and triage, like the proposed system, has not been reported in the literature. ...
... Cytological findings of each patient were interpreted according to the Bethesda classification system and were classified as follows: (a) within normal limits (WNL); (b) atypical squamous cells of undetermined significance (ASCUS); (c) low-grade squamous intraepithelial lesion (LSIL); (d) highgrade squamous intraepithelial lesion (HSIL); (e) squamous cell carcinoma (SCC) or adenocarcinoma (Adeno-Ca). Regarding the HPV DNA test, for which the cytology laboratory is accredited by WHO and is proficient for the specific technique, we considered high-risk (HR) HPV types as HPV types 16,18,26,31,33,35,39,45,51,52,53,56,58,59,66,68,73,82, and 85; and low-risk (LR) HPV types as HPV types 6, 11, 40, 42, 43, 44, 54, 61, 62, 70, 71, 72, 81, 83, 84, and 89 [3, 4]. It is well known that the probability of a lowrisk subtype to cause cervical lesions is very small; however, the specific HPV DNA test is simultaneously identifying both high-risk and low-risk HPV subtypes and thus we used all available typing results during the development of the system, in order to evaluate its performance based on all available information. ...
Article
Full-text available
NOWADAYS, THERE ARE MOLECULAR BIOLOGY TECHNIQUES PROVIDING INFORMATION RELATED TO CERVICAL CANCER AND ITS CAUSE: the human Papillomavirus (HPV), including DNA microarrays identifying HPV subtypes, mRNA techniques such as nucleic acid based amplification or flow cytometry identifying E6/E7 oncogenes, and immunocytochemistry techniques such as overexpression of p16. Each one of these techniques has its own performance, limitations and advantages, thus a combinatorial approach via computational intelligence methods could exploit the benefits of each method and produce more accurate results. In this article we propose a clinical decision support system (CDSS), composed by artificial neural networks, intelligently combining the results of classic and ancillary techniques for diagnostic accuracy improvement. We evaluated this method on 740 cases with complete series of cytological assessment, molecular tests, and colposcopy examination. The CDSS demonstrated high sensitivity (89.4%), high specificity (97.1%), high positive predictive value (89.4%), and high negative predictive value (97.1%), for detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+). In comparison to the tests involved in this study and their combinations, the CDSS produced the most balanced results in terms of sensitivity, specificity, PPV, and NPV. The proposed system may reduce the referral rate for colposcopy and guide personalised management and therapeutic interventions.
... Recently, the identification of the various signaling pathways implicated in the cellular processes of breast cancer cells has drawn the attention of researchers worldwide. The involvement of growth factors or signaling molecules in breast cancer cell proliferation and invasion has also been reported (Ahmad et al. 2011;Cabioglu et al. 2009). ...
Article
Full-text available
Atypical chemokine receptor proteins are termed ‘decoy proteins’ as their binding to the respective ligands does not lead to a typical signaling pathway but intercepts the action of chemokines. This method of chemokine activity regulation may also function in tumor suppression. D6 and DARC (Duffy Antigen Receptor for Chemokines) have been reported as decoy chemokine receptors in cancer studies. Purified Pichia-expressed D6 and DARC, produced in-house, were used in cell-based studies to test their biological activities. Cell viability tests showed that recombinant D6 and DARC did not affect cell viability significantly, suggesting that they were not involved in breast cancer cell death. Wound healing assays showed that the presence of recombinant D6 or DARC at 10 µg/mL optimally inhibited the migration of breast cancer cells. ELISA showed an inverse relationship between the recombinant proteins and CCL2 levels in the treated cells. Migration assay using Boyden chamber demonstrated the function of the recombinant proteins in inhibiting chemotaxis activity of treated cells. Invasion assay showed the ability of the recombinant proteins in inhibiting the invasion property of treated cells. Comparison of single and combinatorial effects of the recombinant proteins showed that the combination of D6 and DARC at a 1:1 ratio (10 µg/mL) is most effective in reducing CCL2 levels and inhibiting the migration and invasion of treated cells. It was shown that the purified Pichia-expressed recombinant D6 and DARC are the negative regulators of breast cancer cell migration and invasion, and the inhibition effects were greater when they were used in combination.
... The construction of GRN using microarray data has enabled the measurement of global response of biological system to examine specific inventions. Even though microarray data is pre-processed prior the subsequent analysis, inference of GRN is a complex task and remains an open research questions, as no complete mammal cellular biological knowledge is available yet [10]. Therefore, one cannot infer the genomic interactions given the activity of each molecule at a time. ...
Article
Background Breast cancer and its treatment can have an impact on health-related quality of life and survival. Tumour profiling tests aim to identify whether or not women need chemotherapy owing to their risk of relapse. Objectives To conduct a systematic review of the effectiveness and cost-effectiveness of the tumour profiling tests onco type DX ® (Genomic Health, Inc., Redwood City, CA, USA), MammaPrint ® (Agendia, Inc., Amsterdam, the Netherlands), Prosigna ® (NanoString Technologies, Inc., Seattle, WA, USA), EndoPredict ® (Myriad Genetics Ltd, London, UK) and immunohistochemistry 4 (IHC4). To develop a health economic model to assess the cost-effectiveness of these tests compared with clinical tools to guide the use of adjuvant chemotherapy in early-stage breast cancer from the perspective of the NHS and Personal Social Services. Design A systematic review and health economic analysis were conducted. Review methods The systematic review was partially an update of a 2013 review. Nine databases were searched in February 2017. The review included studies assessing clinical effectiveness in people with oestrogen receptor-positive, human epidermal growth factor receptor 2-negative, stage I or II cancer with zero to three positive lymph nodes. The economic analysis included a review of existing analyses and the development of a de novo model. Results A total of 153 studies were identified. Only one completed randomised controlled trial (RCT) using a tumour profiling test in clinical practice was identified: Microarray In Node-negative Disease may Avoid ChemoTherapy (MINDACT) for MammaPrint. Other studies suggest that all the tests can provide information on the risk of relapse; however, results were more varied in lymph node-positive (LN+) patients than in lymph node-negative (LN0) patients. There is limited and varying evidence that onco type DX and MammaPrint can predict benefit from chemotherapy. The net change in the percentage of patients with a chemotherapy recommendation or decision pre/post test ranged from an increase of 1% to a decrease of 23% among UK studies and a decrease of 0% to 64% across European studies. The health economic analysis suggests that the incremental cost-effectiveness ratios for the tests versus current practice are broadly favourable for the following scenarios: (1) onco type DX, for the LN0 subgroup with a Nottingham Prognostic Index (NPI) of > 3.4 and the one to three positive lymph nodes (LN1–3) subgroup (if a predictive benefit is assumed); (2) IHC4 plus clinical factors (IHC4+C), for all patient subgroups; (3) Prosigna, for the LN0 subgroup with a NPI of > 3.4 and the LN1–3 subgroup; (4) EndoPredict Clinical, for the LN1–3 subgroup only; and (5) MammaPrint, for no subgroups. Limitations There was only one completed RCT using a tumour profiling test in clinical practice. Except for onco type DX in the LN0 group with a NPI score of > 3.4 (clinical intermediate risk), evidence surrounding pre- and post-test chemotherapy probabilities is subject to considerable uncertainty. There is uncertainty regarding whether or not onco type DX and MammaPrint are predictive of chemotherapy benefit. The MammaPrint analysis uses a different data source to the other four tests. The Translational substudy of the Arimidex, Tamoxifen, Alone or in Combination (TransATAC) study (used in the economic modelling) has a number of limitations. Conclusions The review suggests that all the tests can provide prognostic information on the risk of relapse; results were more varied in LN+ patients than in LN0 patients. There is limited and varying evidence that onco type DX and MammaPrint are predictive of chemotherapy benefit. Health economic analyses indicate that some tests may have a favourable cost-effectiveness profile for certain patient subgroups; all estimates are subject to uncertainty. More evidence is needed on the prediction of chemotherapy benefit, long-term impacts and changes in UK pre-/post-chemotherapy decisions. Study registration This study is registered as PROSPERO CRD42017059561. Funding The National Institute for Health Research Health Technology Assessment programme.
Article
Full-text available
D6, which is also known as CCBP 2, is one of the decoy chemokine receptors. It was recently found to play a role in the progression of breast cancer cells. In this study, the existence of D6 in invasive breast cancer cells, MDA-MB-231 was investigated by One-step RT-PCR with additional Pfu DNA polymerase in the reaction. The amplicons were then sequenced and compared with the reference sequence from GenBank database. Nucleotide sequence analysis showed that the amplicon sequence matches the reference sequence. Thus, it is confirmed that full length D6 sequence had been amplified from MDA-MB-231. Index Terms—Cloning, D6, DNA sequencing, MDA-MB-231.
Article
Full-text available
Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. Results: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.
Article
Full-text available
Most human diseases are complex multi-factorial diseases resulting from the combination of various genetic and environmental factors. In the KEGG database resource (http://www.genome.jp/kegg/), diseases are viewed as perturbed states of the molecular system, and drugs as perturbants to the molecular system. Disease information is computerized in two forms: pathway maps and gene/molecule lists. The KEGG PATHWAY database contains pathway maps for the molecular systems in both normal and perturbed states. In the KEGG DISEASE database, each disease is represented by a list of known disease genes, any known environmental factors at the molecular level, diagnostic markers and therapeutic drugs, which may reflect the underlying molecular system. The KEGG DRUG database contains chemical structures and/or chemical components of all drugs in Japan, including crude drugs and TCM (Traditional Chinese Medicine) formulas, and drugs in the USA and Europe. This database also captures knowledge about two types of molecular networks: the interaction network with target molecules, metabolizing enzymes, other drugs, etc. and the chemical structure transformation network in the history of drug development. The new disease/drug information resource named KEGG MEDICUS can be used as a reference knowledge base for computational analysis of molecular networks, especially, by integrating large-scale experimental datasets.
Article
Full-text available
Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70-80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases ('poor prognosis' signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.
Article
Full-text available
A more accurate means of prognostication in breast cancer will improve the selection of patients for adjuvant systemic therapy. Using microarray analysis to evaluate our previously established 70-gene prognosis profile, we classified a series of 295 consecutive patients with primary breast carcinomas as having a gene-expression signature associated with either a poor prognosis or a good prognosis. All patients had stage I or II breast cancer and were younger than 53 years old; 151 had lymph-node-negative disease, and 144 had lymph-node-positive disease. We evaluated the predictive power of the prognosis profile using univariable and multivariable statistical analyses. Among the 295 patients, 180 had a poor-prognosis signature and 115 had a good-prognosis signature, and the mean (+/-SE) overall 10-year survival rates were 54.6+/-4.4 percent and 94.5+/-2.6 percent, respectively. At 10 years, the probability of remaining free of distant metastases was 50.6+/-4.5 percent in the group with a poor-prognosis signature and 85.2+/-4.3 percent in the group with a good-prognosis signature. The estimated hazard ratio for distant metastases in the group with a poor-prognosis signature, as compared with the group with the good-prognosis signature, was 5.1 (95 percent confidence interval, 2.9 to 9.0; P<0.001). This ratio remained significant when the groups were analyzed according to lymph-node status. Multivariable Cox regression analysis showed that the prognosis profile was a strong independent factor in predicting disease outcome. The gene-expression profile we studied is a more powerful predictor of the outcome of disease in young patients with breast cancer than standard systems based on clinical and histologic criteria.
Article
Full-text available
Gruvberger et al. postulate, in their commentary published in this issue of Breast Cancer Research, that our “prognostic gene set may not be broadly applicable to other breast tumor cohorts”, and they suggest that “it may be important to define prognostic expression profiles separately in estrogen receptor (ER) positive and negative tumors”. This is based on two observations derived from our gene expression profiling data in breast cancer: the overlap between reporter genes for prognosis and ER status, and Gruvberger et al.’s inability to confirm the prognosis prediction using a nonoptimal selection of 58 of our 231 prognosis reporter genes.
Article
Twenty years ago the Human Genome Project was initiated aiming to uncover the genetic factors of human diseases and to develop new strategies for diagnosis, treatment, and prevention. Despite the successful sequencing of the human genome and the discovery of many disease related genes, our understanding of molecular mechanisms is still largely incomplete for the majority of diseases. In the KEGG database project we have been organizing our knowledge on cellular functions and organism behaviors in computable forms, especially in the forms of molecular networks (KEGG pathway maps) and hierarchical lists (BRITE functional hierarchies). The computerized knowledge has been widely used as a reference for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies. Our efforts are now focused on human diseases and drugs. We consider diseases as perturbed states of the molecular system that operates the cell and the organism, and drugs as perturbants to the molecular system. Since the existing disease databases are mostly for humans to read and understand, we develop a more computable disease information resource where our knowledge on diseases is represented as molecular networks or gene/molecule lists. When the detail of the molecular system is relatively well characterized, we use the molecular network representation and draw KEGG pathway maps. The Human Diseases category of the KEGG PATHWAY database contains about 40 pathway maps for cancers, immune disorders, neurodegenerative diseases, etc. When the detail is not known but disease genes are identified, we use the gene/molecule list representation and create a KEGG DISEASE entry. The entry contains a list of known disease genes and other relevant molecules including environmental factors, diagnostic markers, and therapeutic drugs. The list simply defines the membership to the underlying molecular system, but is still useful for computational analysis. In the KEGG DRUG database we capture knowledge on two types of molecular networks. One is the interaction network of drugs with target molecules, metabolizing enzymes, transporters, other drugs, and the pathways involving all these molecules. The other is the chemical structure transformation network representing the biosynthetic pathways of natural products in various organisms, as well as the history of drug development where drug structures have been continuously modified by medicinal chemists. KEGG DRUG contains chemical structures and/or chemical components of all prescription and OTC drugs in Japan including crude drugs and TCM (Traditional Chinese Medicine) formulas, as well as most prescription drugs in USA and many prescription drugs in Europe. I will report on our strategy to analyze the chemical architecture of natural products derived from enzymatic reactions (and enzyme genes) and the chemical architecture of marketed drugs derived from human made organic reactions in the history of drug development, towards drug discovery from the genomes of plants and microorganisms.
Article
Previous studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical breast cancer recurrence, however, many of these predictive models have been derived using simple computational algorithms and validated internally or using one-way validation on a single dataset. We have recently developed a new feature selection algorithm that overcomes some limitations inherent to high-dimensional data analysis. In this study, we applied this algorithm to two publicly available gene expression datasets obtained from over 400 patients with breast cancer to investigate whether we could derive more accurate prognostic signatures and reveal common predictive factors across independent datasets. We compared the performance of three advanced computational algorithms using a robust two-way validation method, where one dataset was used for training and to establish a prediction model that was then blindly tested on the other dataset. The experiment was then repeated in the reverse direction. Analyses identified prognostic signatures that while comprised of only 10-13 genes, significantly outperformed previously reported signatures for breast cancer evaluation. The cross-validation approach revealed CEGP1 and PRAME as major candidates for breast cancer biomarker development.
Article
To explore the independent prognostic impact of medial hemisphere tumor location in early breast cancer. A comprehensive database was used to review patients referred to the British Columbia Cancer Agency from 1989 to 1995 with early breast cancer. Patients were grouped according to relapse risk (high or nonhigh) and adjuvant systemic therapy received. Multiple regression analysis was used to determine whether the significance of primary tumor location (medial v lateral hemisphere) was independent of known prognostic factors and treatment. In the adjuvant systemic therapy groups, medial location was associated with a 50% excess risk of systemic relapse and breast cancer death compared with lateral location. Five-year systemic disease-free survival rates were 66.3% and 74.2% for high-risk medial and lateral lesions, respectively (P <.005). Corresponding 5-year disease-specific survival rates were 75.7% and 80.8%, respectively (P <.03). No significant differences were observed between medial and lateral location for low-risk disease regardless of adjuvant therapy or for high-risk disease with no adjuvant therapy. Local recurrence rates were similar for all risk and therapy groups. The two-fold risk of relapse and breast cancer death associated with high-risk medial breast tumors may be due to occult spread to internal mammary nodes (IMNs). Enhanced local control, such as with irradiation of the IMN chain, may be one way to reduce the excess risk. Ongoing randomized controlled trials may provide prospective answers to the question of the optimal volume of radiotherapy.
Article
We compared the power of gene expression measurements with that of conventional prognostic markers, i.e., clinical, histopathological, and cell biological parameters, for predicting distant metastases in breast cancer patients using both established prognostic indices (e.g., the Nottingham Prognostic Index (NPI)) and novel combinations of conventional markers. We used publicly available data on 97 patients, and the performance of metastasis prediction was represented by receiver operating characteristic (ROC) areas and Kaplan-Meier plots. The gene expression profiler did not perform noticeably better than indices constructed from the clinical variables, e.g., the well established NPI. When analysing separately subgroups, according to the oestrogen receptor (ER) status both approaches could predict clinical outcome more easily for the ER-positive than for the ER-negative cohort. Given the time it may take before microarray processing is used worldwide, particularly due to the costs and the lack of standards, it is important to pursue research using conventional markers. Our analysis suggests that it might be possible to improve the combination of different conventional prognostic markers into one prognostic index.
Article
To evaluate the prognostic value of cyclin E with a quantitative method for lymph node-negative primary breast cancer patients. mRNA transcripts of full-length and splice variants of cyclin E1 (CCNE1) and cyclin E2 (CCNE2) were measured by real-time PCR in frozen tumor samples from 635 lymph node-negative breast cancer patients who had not received neoadjuvant or adjuvant systemic therapy. None of the PCR assays designed for the specific splice variants of the cyclins gave additional prognosis-related information compared with the common assays able to detect all variants. In Cox multivariate analysis, corrected for the traditional prognostic factors, high levels of cyclin E were independently associated with a short distant metastasis-free survival [hazard ratio (HR), 3.40; P < 0.001 for CCNE1 and HR, 1.76; P < 0.001 for CCNE2, respectively]. After dichotomizing the tumors at the median level of 70% tumor cells, the multivariate analysis showed particularly strong results for CCNE1 in the group of 433 patients with stroma-enriched primary tumors (HR, 5.12; P < 0.001). In these tumors, the worst prognosis was found for patients with estrogen receptor-negative tumors expressing high CCNE1 (HR, 9.89; P < 0.001) and for patients with small (T1) tumors expressing high CCNE1 (HR, 8.47; P < 0.001). Our study shows that both CCNE1 and CCNE2 qualify as independent prognostic markers for lymph node-negative breast cancer patients, and that CCNE1 may provide additional information for specific subgroups of patients.