ArticlePDF Available

Synergy network based inference for breast cancer metastasis

December 2011
Procedia Computer Science 3:1094-1100

December 2011
3:1094-1100

DOI:10.1016/j.procs.2010.12.178

Source
DBLP

License
CC BY-NC-ND 3.0

Authors:

Farzana Kabir Ahmad

Universiti Utara Malaysia

Safaai deris

University of Malaysia, Kelantan

Mohd Syazwan Abdullah

Universiti Utara Malaysia

Breast cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. In many patients, microscopic or clinically evident metastases have already occurred by the time the primary tumor is diagnosed. Chemotherapy or hormonal therapy reduces the risk of distant metastasis by one-third, but it is estimated that about 70% to 80% of patients receiving treatment would have survived without it. Therefore, being able to predict breast cancer metastasis can spare a significant number of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Current studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease recurrence. However, most of these studies attempt to develop genetic marker-based prognostic systems to replace the existing clinical criteria, while ignoring the rich information contained in established clinical markers. Clinical markers, such as patient history and laboratory analysis, which are the basis of day-to-day clinical decision support, are often underused to guide the clinical management of cancer in the presence of microarray data. As a result, given the complexity of breast cancer prognosis, we proposed a novel strategy based on synergy network that utilize both clinical and genetic markers to identify the potential hybrid signatures and investigate their interactions which are associated with breast cancer metastasis. In this study, a computational method is performed on publicly available microarray and clinical data. A rigorous experimental protocol is used to estimate the prognostic performance of the hybrid signature and other prognostic approaches. The hybrid signature performs significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus criterion. At 90% sensitivity level, the hybrid signature achieves 77% specificity, as compared to 53% for the 70-gene signature and 43% for the clinical makers. The predicted results also showed a strong dependence of regulator genes that are related to cell death in cell development process. These significant gene regulators are useful to understand cancer biology and in producing new drug design.

A simple Bayesian network representation that explicates relationships between five genes

…

Synergy network for breast cancer metastasis

…

The top ten scoring interactions. No. Rel indicates the number of relation involved while Pred referred to predictor genes.

…

Figures - uploaded by Farzana Kabir Ahmad

Content may be subject to copyright.

Content uploaded by Farzana Kabir Ahmad

Content may be subject to copyright.

Available via license: CC BY-NC-ND 3.0

Content may be subject to copyright.

Procedia Computer Science 00 (2010) 000–000

Procedia

Computer

Science

www.elsevier.com/locate/procedia

WCIT-2010

Synergy network based inference for breast cancer metastasis

Farzana Kabir Ahmad a*, Safaai Derisb and Mohd. Syazwan Abdullahc

a, cGraduate Department of Computer Science, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia.

bFaculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia.

Abstract

Breast cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. In many patients, microscopic or

clinically evident metastases have already occurred by the time the primary tumor is diagnosed. Chemotherapy or hormonal

therapy reduces the risk of distant metastasis by one-third, but it is estimated that about 70% to 80% of patients receiving

treatment would have survived without it. Therefore, being able to predict breast cancer metastasis can spare a significant number

of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Current

studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease

recurrence. However, most of these studies attempt to develop genetic marker-based prognostic systems to replace the existing

clinical criteria, while ignoring the rich information contained in established clinical markers. Clinical markers, such as patient

history and laboratory analysis, which are the basis of day-to-day clinical decision support, are often underused to guide the

clinical management of cancer in the presence of microarray data. As a result, given the complexity of breast cancer prognosis,

we proposed a novel strategy based on synergy network that utilize both clinical and genetic markers to identify the potential

hybrid signatures and investigate their interactions which are associated with breast cancer metastasis. In this study, a

computational method is performed on publicly available microarray and clinical data. A rigorous experimental protocol is used

to estimate the prognostic perfor mance of the hybrid s ignature and other pr ognosti c appr oaches. The hybr id signature perfor ms

significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus

criterion. At 90% sensitivity level, the hybrid signature achieves 77% specificity, as compared to 53% for the 70-gene signature

and 43% for the clinical makers. The predicted results also showed a strong dependence of regulator genes that are related to cell

death in cell development process. These significant gene regulators are useful to understand cancer biology and in producing

new drug design.

Keywords: Synergy network; Bayesian network; breast cancer metastasis; inference; conditional independence.

1. Introduction

Breast cancer is a leading cause of cancer-related death and among one of the most aggr essive metastasis disease

worldwide. The growing mortality rate, with 410,000 deaths each year has yield more than 1.6% of all women

deaths worldwide [1]. The major clinical problems of breast cancer ar e the recurrence of disseminated disease and

metastatic behavior. In numerous patients, miniature or clinically evident metastases have already occurred by the

time the primary tumor is diagnosed. Although, treatments such as chemotherapy and endocrine therapy could

reduce the risk of distant metastasis by approximately one-third, however it is predicted 80% of patient would have

survived without receiving these treatments. Being prescribed with highly expensive medicines which turn out to be

unnecessary has caused several complications and exacerbates the condition of breast cancer patients. As the results,

the study of tumor progression and breast cancer metastasis has become a great interest in biomedical field.

Procedia Computer Science 3 (2011) 1094–1100

www.elsevier.com/locate/procedia

1877-0509 c

⃝2010 Published by Elsevier Ltd.

doi:10.1016/j.procs.2010.12.178

⃝2010 Published by Elsevier Ltd. Open access under CC BY-NC-ND license.

Selection and/or peer-review under responsibility of the Guest Editor.

Open access under CC BY-NC-ND license.

Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000

Despite significant advances in the treatment of primary breast cancer and enormous studies that have been

conducted, the ability to infer the metastatic behavior of tumors remains one of the most clinical challenges in

oncology. The main cause for this setback is the complex interactions in the cancer progression and metastasis

formation. In early days, three commonly used treatment guidelines such as TNM (tumor, lymph nodes and

metastasis) Tumor Staging System, St. Gallen and NIH (National Institute of Health) consensus criteria have been

used to determine the distant metastases. These breast cancer indices are based on clinical markers such as tumor

size, lymph node involvement, patient age and the aggressiveness of the cancer founded on histopathological

parameters. Regardless of the prominent practiced of these indices, it provides inaccurate r esults in predicting

therapy failure with only 10% specificity at 90% sensitivity level. Thus, a more accurate prognostic criterion is

urgently needed to avoid unnecessary treatment in newly diagnosed patients.

Recently, the development of g enetic marker-based prognostic system has become a breakthrough in cancer

progression research and most studies concentr ated their efforts solel y on this approach. Yet, some researchers do

believe th at the application of gen e expression data to infer cancer progression is often overused in the presence of

clinical data [2]. Clinical data which has been used on daily basis has been neglected and the rich information

contained in established clinical markers has been ignored. Given the complexity of breast cancer metastasis, a more

practical and sensible strategy is to incorporate both clinical and genetic markers that may contain complementary

information.

A small number of studies have been conducted to determine the possibility of integrating clinical and genetic

markers to infer breast cancer metastasis [3, 4]. While some of these approaches show a great promise in

incorporating two different markers to infer cancer metastasis, the issue of high dimensionality data has rarely been

discussed. One important characteristic of microarray data is the extremely large amount of data in a very small

sample size. Thus, by integrating two markers to infer cancer metastasis could be computationally complex as large

number of variables may need to be examined. In this paper, we seek to improve the ability to infer breast cancer

metastasis using a novel strategy known as synergy network. This method is solely based on Bayesian network that

apply two different approaches: an information-theoretic approach and conditional independence approach. Our

keen interest is to obtain correctly learnt network in order to examine the two markers, clinical and genetic markers,

in the pr esence of a third variable which represent the state of the cell (metastasis). In addition, we offered scoring

markers interactions that provide insights into the tumor progression and indicate markers that highly regulate breast

cancer metastasis.

The reminder of this paper is organized as follows. Section 2 describes the method used to develop synergy

network based on Ba yesian network to integrate two diverse markers. This section also elaborates the approaches

taken to implement correct learnt structure learning in order to address the issue of high dimensional data. The

empirical results and discussion are presented in Section 3, while Section 4 provides concluding remarks.

2. Methods

2.1 Bayesian network

Bayesian n etwork is a probabilistic graphical model in which vertices represent random variables an d the absence

of an edge between two vertices represents con ditional independence. Consider a finite set Vn= {X1, X2,…, Xn} of

random variables. Bayesian network representation contains two components: a directed acyclic graph (DAG), G =

(Vn, E

G) which vertices correspond to random variables, and conditional probability distributions of the random

variables, given its dependent variables (parents) in G. The joint distribution of these conditional probability

distributions is defined as follows:









ni VX

iin XPaXPXXP |,...,

1(1)

wher e



ii XPaXP | is a set of conditional probabilities for each variables i

Xand





XPa is the set of variables

which are the parents of i

X in graph G.

We use an example to illustrate the basic idea of Ba yesian networks. Given a Bayesian network specified in Fig. 1

for 5 genes: X1, X2, X3, X4, and X5, this structure specifies the parents for genes X3, X4, and X5: Pa(X3)={X1, X2},

Pa (X4)={ X1 }, Pa(X5)={ X3}, where Pa(V) represents the parent vertex set for vertex V.

F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100 1095

Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000

Fig. 1: A simple Bayesian network representation that explicates relationships between five genes

In our context, it can be interpreted that when genes in Z are at fixed expressi on levels, expression levels of genes

in X do not give any information on the expression levels of genes in Y and vice versa. Once the structure of G is

specified for a set of genes, we can interpret a directional edge from X to Y in G as a statement that X is the “cause”

of Y, or the expression level of X has an effect on the expression level of Y. Therefore, obtaining the correct

structure of Bayesian network is essential to perform an efficient inference and correctly represent the dependency

relationship.

Determining the optimal network through Bayesian learning structure has been investigated for many decades. In

conjunction with the invention of micr oarray technology, the problem of sear ching the best fit network given the

datasets have become harder. It is due to the fact that analyzing these high-dimensionality data require a large

number of variables to be analyzed, which yield to exponential growth in sear ching space, known as NP hard.

Generally there are two approaches to learn the structure of Bayesian networks from data: th e search and scorin g

methods and dependency analysis methods.In the first approach, the learning problem is viewed as searching for a

structure that best fits the data. Different scoring methods have been applied to determine the fit between the

network structure an d the data, in cluding Bayesian scoring, entropy-based, and minimum descr iption length, among

others. The dependency analysis appr oach, on the other hand tries to discover from data the depen dencies among

variables and then use these dependencies to construct the network structure.Lately, the dependency analysis

approach is discovered to be more efficient than the search and scoring approach for sparse networks (the number of

edges in the graph is relatively small).In attaining our goal, to infer breast cancer metastasis by integrating clinical

and genetic markers, in this paper we proposed a new strategy to find the optimal structure. In the following section

2.2, we introduce the main steps of the proposed method.

2.2 Synergy network based on information-theoretic approach and conditional independence

Clinical markers and gene expression profiles play an important role in determining breast cancer metastasis.

Integrating and analyzing all th is information to discover factors that r egulate cancer progression require network-

based algorithm. In this study, we employed a novel strategy known as synergy network to achieve the objective.

This method is developed solely based on Bayesian network that rely on two structure learning algorithm:

information-theoretic approach and conditional independence. Our method generally has two different features. In

the first feature, the synergy network is implemented using mutual information that measures the cooperative effect

of two variables on the state of a third. The two variables in this case are genes and clinical markers (Gi and Ci), an d

the third state is the binary state variable representing the occurrence of metastasis (M) [M = 1 (Metastasis present)

and M = 0 (Metastasis not present)]. Mutual information is used to decide which interactions (edges) are more

prominent than the others. Mutual information between random variables X and Y is defined as follows:

 

YXHXHYXI |;  (2)

where



XH is the entropy of X and



YXH | is conditional entropy of X given Y. By applying the same rule to our

study, we formulaically calculated the integration of genes and clinical markers (Gi and Ci) as:

1096 F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100

Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000

 

MCIMGIMCGI iiii ;;;,  (3)

where



MCGI ii ;, is the cooper ative effect of two variables and



MCIMGI ii ;; is the individual effect.

The proposed algorithm starts from a non-conn ected network, whereby there is still no edge involved between

nodes. Then, we calculate the mutual information for two nodes from the network and based on these mutual

information values the edges is ordered. Sequentially, conn ection between nodes is dr awn according to mutual

information ordered value, which also offer the highest scoring interactions values. The edges which has the mutual

information value less than thr eshold (threshold = 0.1) are excluded as candidates of correct edges. We only choose

edges that have higher values (> threshold) to be the correct edge as it contains better probability of conn ection.

In the second feature, we further examined the learning structure of constructed network (obtained from the first

feature) by using conditional independence approach. Two variables, for instance A and B may have different

structures, ABand AB but carrying the identical mutual information values. Thus, to overcome this issue,

in this algorithm, conditional indepen dence is used to search edges that are incorrect in a triangular structure as

depicted in Fig. 2. Conditional independence is defined as follows:











kjkikji XXPXXPXXXP |||, (4)

Hence, once we detected the edge create a triangle loop and hold the same mutual information value, all three edges

included in the triangle will be run based on equation and we used the result of these test to update the network.

Fig. 2: Triangular structure of three nodes and two edges

2.3 Dataset and Pre-processing

The proposed method is tested and analyzed on van’t Veer et al. [5] dataset, which was obtained from Integrated

Tumor Transcriptome Array and Clinical data Analysis database (ITTACA (2006)). This data set contains

expression profile information derived from 97 lymph n ode negative breast cancer patients, 55 years old or younger

and associated clinical information including age, tumor size, histological grade, angioinvasion, lymphocytic

infiltration, estrogen receptor (ER) an d progesterone receptor (PR) status, which all together form clinical markers.

Prior the implementation of proposed method, the missing values present in this dataset was addressed. Manifold

missing gene expression values is a common problem in microarray dataset. K-nearest neighbors (kNN) imputation

method with k = 10 was used to handle these missing values. The kNN imputation method is utilized as it is the

most robust and sensitive approach to estimate missing values in microarray data set. It is proven to be prominent

and effective method through Troyanskaya et al.’s research [6]. Subsequently, the processed dataset was used as an

input in the proposed method.

3. Results and Discussion

The proposed method was executed on the breast cancer dataset to obtain insights into the cancer development

and how various factors may trigger metastasis progression, producing a synergy net work as shown in Fig 3.

Additionally, the top ten scoring interactions for this particular network ar e given in Table 1. The learn ed network

reveals a group of genes a nd clinical markers which are primarily associated with causing metastasis, M. The larger

nodes in the graph specify th e genes when expressed at different levels lead to a major effect on the status of other

genes (e.g., on or off)/clinical marker, and the light-shaded nodes denote highly regulated genes. Four genes that are

found to regulate the expression levels of other genes are: BBC3, GNAZ, TSPY-like5 (TSPY5), and DCK. Two

genes are highly regulated: FLJ11354 and CCNE2. Meanwhile, angioinvasion has been identified as strong factor in

causing breast cancer metastasis. This network involved 50 genes and 6 clinical mar kers which are closely

associated with breast cancer metastasis, M.

F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100 1097

Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000

Fig. 3: S ynergy ne twork for breast ca ncer meta stasis

The constr ucted network indicates that the BBC3 gen e has a prominent role in regulating oth ers genes. Eight

genes are correlated with BBC3. The BBC3 gene, also known as PUMA is activated by the tumor suppressor p53,

which is a key regulator of apoptosis and tumorigenesis in breast cancer. On the other hand, there is insufficient

information about whether GNAZ could directly regulate the progression of breast cancer, however we discovered

that it has an essential role in cellular processes of the nervous system [7]. Meanwhile, TSPYL5 has been identified

as a genetic marker for breast cancer in several studies [4, 8]. Lastly, DCK is revealed to be associated with

resistan ce to antiviral and anticancer chemotherapeutic agents, therefore this gene is clinically important because of

its relationship to drug resistance and sensitivity. Outside of these regulator genes, two additional highly regulated

genes have been identified in the analysis of our proposed method: FLJ11354 and CCNE2. The FLJ11354 gene was

discovered by Sun et al. [9], while CCNE2 has been reported to qualify as independent prognostic markers for

lymph node–negative breast cancer patients [10].From the clinical markers point of view, angioinvasion is

identified as critical factor that yield to cancer progression compared to other clinical mar kers

We then further evaluated the performance of constructed synergy network using a receiver operating

characteristic (ROC) curve obtained by varying a decision threshold, which can provide a direct view on how this

inference network performs at the different sensitivity and specificity levels. By following the study of van’t Veer

and colleagues [5], a sensitivity is set equal to 90%. The corresponding specificities are computed and reported in

Table 2. For the purpose of comparison, the specificities of the TNM Tumor Staging System, St. Gallen and NIH

consensus criteria ar e also compar ed.

Table 1: The t op ten scorin g interactions. No. Rel indi cates the numb er of relation involved while Pr ed referred t o predictor genes.

No. Rel Pred Target Score

1 Metastasis Contig63649_RC 0.000487

2 Metastasis CCNE2 0.001081

3 UCH37 Contig40831_RC 0.003260

4 BBC3 PRC1 0.006655

5 BBC3 ORC6L 0.014038

1098 F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100

Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000

6 WISP1 COL4A2 0.014892

7 DIAPH3 GNAZ 0.015032

8 MCM6 CCNE2 0.015653

9 DCK Contig55377_RC 0.017289

10 HRASLS FLJ22477 0.017314

Table 2: Inference network at sensitivity of 90%

Methods Specificit y AUC std

NIH 2000 0% 0.61905 0.16234

TNM Staging System 18% 0.71429 0.12747

St. Gallen 43% 0.73810 0.36204

Genetic markers 53% 0.79203 0.03245

Clinical and genetic markers

(hybrid signatures)

77% 0.86438 0.02928

We observed that the St. Gallen criterion significantly outperformed both the TMN staging system and the NIH

2000 consensus, whereas the latter approach (the NIH 2000 consensus) was worse than the TNM staging system.

The St. Gallen criterion achieved a specificity of 43%, while the TNM staging system and the NIH 2000 consensus

obtained a specificity of 18% and 0%, respectively. This result is consistent with previous reports in the literature

[11] whereby the specificity of the St. Gallen criterion outperforms the other clinical indices. On the other hand, the

clinical and genetic marker s (hybrid signatures) improve the specificities of the genetic mar kers and the clinical

markers (St. Gallen criterion) approximately by 20%-30%. We point out that our estimation of the specificity of the

70-gene signature is worse than that reported in [5](43% versus 73%), but is consistent with that in the follow-up

validation done on a larger dataset [12] (53%). Furthermore, we measured the area under curve (AUC) for all five

methods, where th e highest AUC suggesting a better inferen ce network. Therefore, our results have shown that

clinical and genetic marker s improved the specificity of inference n etwork compared to network those based on

genetic and clinical marker alone.

4. Conclusion

Understanding the breast cancer progression network structure reveals the inherent biological information flow

and interactions of various factors which will lead to more effective therapies and disease treatments. In this paper,

we applied computation model which was implemented based on synergy network to study the breast cancer

metastasis using genetic and clinical mar kers. Different genes and clinical markers were found to have high

correlation in causing metastasis. For future work, we intend to validate our discovery by using biological

knowledge. This attempt could arm biologist with information regarding up-stream and down-stream of gene

mechanisms, which further enlighten the interactions in tumor progression.

References

[1] P. Boyle and B. Levin, "World cancer report 2008," International Agency for Resear ch on Cancer, World

Health Organization 2008.

[2] P. Edén, C. Ritz, C. Rose, M. Fernö, and C. Peterson, "“Good Old” clinical markers have similar power in

breast cancer progn osis as microarray gene expression profilers," European Journal of Cancer, vol. 40, pp.

1837-1841, 2004.

[3] O. Geva ert, F. D. Smet, D. Timmerman, Y. Moreau, an d B. D. Moor, "Predicting the progn osis of breast

cancer by integrating clinical and microarray data with Bayesian networks," Bioinformatics, vol. 22, pp.

e184–e190, 2006.

[4] Y. Sun, S. Goodison, J. Li, L. Liu, and W. Farmerie, "Improved breast cancer prognosis through the

combination of clinical and genetic markers," Bioinformatics, vol. 23, pp. 30-37, 2007.

[5] L. J. van't Veer, H. Dai, M. J. van De Vijver, Y. D. He, A. M. Hart, M. Mao, et al., "Gene expression

profilin g predicts clinical outcome of breast cancer," Nature, vol. 415, pp. 530 - 536, 2002.

[6] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, et al., "Missing value

estimation methods for dna microarrays," Bioinformatics, vol. 17, pp. 520–525, 2001.

F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100 1099

Farzana Kabir Ahmad / Procedia Computer Science 00 (2010) 000–000

[7] M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa, "KEGG for representation and

analysis of molecular networks in volvin g disea ses and drugs," Nucleic Acids Research, pp. 1-6, 2009.

[8] G. Alexe, S. Alexe, D. E. Axelrod, T. O. Bonates, Lozina, M. Reiss, et al., "Breast cancer prognosis by

combinatorial analysis of gene expression data," Breast Cancer Research, vol. 8, pp. R41, 2006.

[9] Y. Sun, V. Urquidi, and S. Goodison, "Derivation of molecular sign atures for breast cancer recurrence

prediction using a two-way validation approach," in Breast Cancer Research Treatment: Springer

Netherlands, 2009.

[10] A. M. Sieuwerts, M. P. Look, M. E. Gelder, M. Timmermans, A. A. C. Trapman, R. RodriguezGarcia, et

al. "Which Cyclin E prevails as prognostic marker for breast cancer? Results from a retrospective study

involving 635 lymph node negative breast cancer patients," Clinical Cancer Research, vol. 12, pp. 3319-

3328, 2006.

[11] C. Lohrisch, J. Jackson, A. Jones, D. Mates, and I. A. Olivotto, "Relationship between tumor location and

relapse in 6,781 women with early invasive breast cancer," Journal of Clinical Oncology, vol. 18, pp.

2828-2835, 2000.

[12] M. J. van De Vijver, Y. D. He, L. J. van't Veer, H. Dai, A. M. Hart, D. W. Voskuil, and et al., "A gene-

expression signature as a predict of survival in breast can cer," The New England Journal of Medicine, vol.

347, pp. 1999-2009, 2002.

1100 F.K. Ahmad et al. / Procedia Computer Science 3 (2011) 1094–1100

An Intelligent Clinical Decision Support System for Patient-Specific Predictions to Improve Cervical Intraepithelial Neoplasia Detection

Article

Full-text available

Apr 2014
BMRI

NOWADAYS, THERE ARE MOLECULAR BIOLOGY TECHNIQUES PROVIDING INFORMATION RELATED TO CERVICAL CANCER AND ITS CAUSE: the human Papillomavirus (HPV), including DNA microarrays identifying HPV subtypes, mRNA techniques such as nucleic acid based amplification or flow cytometry identifying E6/E7 oncogenes, and immunocytochemistry techniques such as overexpression of p16. Each one of these techniques has its own performance, limitations and advantages, thus a combinatorial approach via computational intelligence methods could exploit the benefits of each method and produce more accurate results. In this article we propose a clinical decision support system (CDSS), composed by artificial neural networks, intelligently combining the results of classic and ancillary techniques for diagnostic accuracy improvement. We evaluated this method on 740 cases with complete series of cytological assessment, molecular tests, and colposcopy examination. The CDSS demonstrated high sensitivity (89.4%), high specificity (97.1%), high positive predictive value (89.4%), and high negative predictive value (97.1%), for detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+). In comparison to the tests involved in this study and their combinations, the CDSS produced the most balanced results in terms of sensitivity, specificity, PPV, and NPV. The proposed system may reduce the referral rate for colposcopy and guide personalised management and therapeutic interventions.

Pichia-Expressed Recombinant D6 and DARC Negatively Affect Cell Migration and Invasion of Breast Cancer Cells

Article

Full-text available

Oct 2021

Atypical chemokine receptor proteins are termed ‘decoy proteins’ as their binding to the respective ligands does not lead to a typical signaling pathway but intercepts the action of chemokines. This method of chemokine activity regulation may also function in tumor suppression. D6 and DARC (Duffy Antigen Receptor for Chemokines) have been reported as decoy chemokine receptors in cancer studies. Purified Pichia-expressed D6 and DARC, produced in-house, were used in cell-based studies to test their biological activities. Cell viability tests showed that recombinant D6 and DARC did not affect cell viability significantly, suggesting that they were not involved in breast cancer cell death. Wound healing assays showed that the presence of recombinant D6 or DARC at 10 µg/mL optimally inhibited the migration of breast cancer cells. ELISA showed an inverse relationship between the recombinant proteins and CCL2 levels in the treated cells. Migration assay using Boyden chamber demonstrated the function of the recombinant proteins in inhibiting chemotaxis activity of treated cells. Invasion assay showed the ability of the recombinant proteins in inhibiting the invasion property of treated cells. Comparison of single and combinatorial effects of the recombinant proteins showed that the combination of D6 and DARC at a 1:1 ratio (10 µg/mL) is most effective in reducing CCL2 levels and inhibiting the migration and invasion of treated cells. It was shown that the purified Pichia-expressed recombinant D6 and DARC are the negative regulators of breast cancer cell migration and invasion, and the inhibition effects were greater when they were used in combination.

Synergy Network Inference Model Based on Heterogeneous Data Integration

Article

Full-text available

Feb 2018

Identifying High-Risk Breast Cancer Patients Using Microarray and Clinical Data

Conference Paper

Dec 2020

Tumour profiling tests to guide adjuvant chemotherapy decisions in early breast cancer: A systematic review and economic analysis

Article

Jun 2019

Background Breast cancer and its treatment can have an impact on health-related quality of life and survival. Tumour profiling tests aim to identify whether or not women need chemotherapy owing to their risk of relapse. Objectives To conduct a systematic review of the effectiveness and cost-effectiveness of the tumour profiling tests onco type DX ® (Genomic Health, Inc., Redwood City, CA, USA), MammaPrint ® (Agendia, Inc., Amsterdam, the Netherlands), Prosigna ® (NanoString Technologies, Inc., Seattle, WA, USA), EndoPredict ® (Myriad Genetics Ltd, London, UK) and immunohistochemistry 4 (IHC4). To develop a health economic model to assess the cost-effectiveness of these tests compared with clinical tools to guide the use of adjuvant chemotherapy in early-stage breast cancer from the perspective of the NHS and Personal Social Services. Design A systematic review and health economic analysis were conducted. Review methods The systematic review was partially an update of a 2013 review. Nine databases were searched in February 2017. The review included studies assessing clinical effectiveness in people with oestrogen receptor-positive, human epidermal growth factor receptor 2-negative, stage I or II cancer with zero to three positive lymph nodes. The economic analysis included a review of existing analyses and the development of a de novo model. Results A total of 153 studies were identified. Only one completed randomised controlled trial (RCT) using a tumour profiling test in clinical practice was identified: Microarray In Node-negative Disease may Avoid ChemoTherapy (MINDACT) for MammaPrint. Other studies suggest that all the tests can provide information on the risk of relapse; however, results were more varied in lymph node-positive (LN+) patients than in lymph node-negative (LN0) patients. There is limited and varying evidence that onco type DX and MammaPrint can predict benefit from chemotherapy. The net change in the percentage of patients with a chemotherapy recommendation or decision pre/post test ranged from an increase of 1% to a decrease of 23% among UK studies and a decrease of 0% to 64% across European studies. The health economic analysis suggests that the incremental cost-effectiveness ratios for the tests versus current practice are broadly favourable for the following scenarios: (1) onco type DX, for the LN0 subgroup with a Nottingham Prognostic Index (NPI) of > 3.4 and the one to three positive lymph nodes (LN1–3) subgroup (if a predictive benefit is assumed); (2) IHC4 plus clinical factors (IHC4+C), for all patient subgroups; (3) Prosigna, for the LN0 subgroup with a NPI of > 3.4 and the LN1–3 subgroup; (4) EndoPredict Clinical, for the LN1–3 subgroup only; and (5) MammaPrint, for no subgroups. Limitations There was only one completed RCT using a tumour profiling test in clinical practice. Except for onco type DX in the LN0 group with a NPI score of > 3.4 (clinical intermediate risk), evidence surrounding pre- and post-test chemotherapy probabilities is subject to considerable uncertainty. There is uncertainty regarding whether or not onco type DX and MammaPrint are predictive of chemotherapy benefit. The MammaPrint analysis uses a different data source to the other four tests. The Translational substudy of the Arimidex, Tamoxifen, Alone or in Combination (TransATAC) study (used in the economic modelling) has a number of limitations. Conclusions The review suggests that all the tests can provide prognostic information on the risk of relapse; results were more varied in LN+ patients than in LN0 patients. There is limited and varying evidence that onco type DX and MammaPrint are predictive of chemotherapy benefit. Health economic analyses indicate that some tests may have a favourable cost-effectiveness profile for certain patient subgroups; all estimates are subject to uncertainty. More evidence is needed on the prediction of chemotherapy benefit, long-term impacts and changes in UK pre-/post-chemotherapy decisions. Study registration This study is registered as PROSPERO CRD42017059561. Funding The National Institute for Health Research Health Technology Assessment programme.

The Construction of Recombinant D6 Clone for in Vitro Breast Cancer Study

Article

Full-text available

Jan 2013

D6, which is also known as CCBP 2, is one of the decoy chemokine receptors. It was recently found to play a role in the progression of breast cancer cells. In this study, the existence of D6 in invasive breast cancer cells, MDA-MB-231 was investigated by One-step RT-PCR with additional Pfu DNA polymerase in the reaction. The amplicons were then sequenced and compared with the reference sequence from GenBank database. Nucleotide sequence analysis showed that the amplicon sequence matches the reference sequence. Thus, it is confirmed that full length D6 sequence had been amplified from MDA-MB-231. Index Terms—Cloning, D6, DNA sequencing, MDA-MB-231.

Missing Value Estimation Methods for DNA Microarrays

Article

Full-text available

Jul 2001

Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. Results: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M.. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38: D355-D360

Article

Full-text available

Oct 2009
NUCLEIC ACIDS RES

Most human diseases are complex multi-factorial diseases resulting from the combination of various genetic and environmental factors. In the KEGG database resource (http://www.genome.jp/kegg/), diseases are viewed as perturbed states of the molecular system, and drugs as perturbants to the molecular system. Disease information is computerized in two forms: pathway maps and gene/molecule lists. The KEGG PATHWAY database contains pathway maps for the molecular systems in both normal and perturbed states. In the KEGG DISEASE database, each disease is represented by a list of known disease genes, any known environmental factors at the molecular level, diagnostic markers and therapeutic drugs, which may reflect the underlying molecular system. The KEGG DRUG database contains chemical structures and/or chemical components of all drugs in Japan, including crude drugs and TCM (Traditional Chinese Medicine) formulas, and drugs in the USA and Europe. This database also captures knowledge about two types of molecular networks: the interaction network with target molecules, metabolizing enzymes, other drugs, etc. and the chemical structure transformation network in the history of drug development. The new disease/drug information resource named KEGG MEDICUS can be used as a reference knowledge base for computational analysis of molecular networks, especially, by integrating large-scale experimental datasets.

Gene expression profiling predicts clinical outcome of breast cancer

Article

Full-text available

Feb 2002

Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70-80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases ('poor prognosis' signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.

A Gene-Expression Signature as a Predictor of Survival in Breast Cancer

Article

Full-text available

Dec 2002
NEW ENGL J MED

A more accurate means of prognostication in breast cancer will improve the selection of patients for adjuvant systemic therapy. Using microarray analysis to evaluate our previously established 70-gene prognosis profile, we classified a series of 295 consecutive patients with primary breast carcinomas as having a gene-expression signature associated with either a poor prognosis or a good prognosis. All patients had stage I or II breast cancer and were younger than 53 years old; 151 had lymph-node-negative disease, and 144 had lymph-node-positive disease. We evaluated the predictive power of the prognosis profile using univariable and multivariable statistical analyses. Among the 295 patients, 180 had a poor-prognosis signature and 115 had a good-prognosis signature, and the mean (+/-SE) overall 10-year survival rates were 54.6+/-4.4 percent and 94.5+/-2.6 percent, respectively. At 10 years, the probability of remaining free of distant metastases was 50.6+/-4.5 percent in the group with a poor-prognosis signature and 85.2+/-4.3 percent in the group with a good-prognosis signature. The estimated hazard ratio for distant metastases in the group with a poor-prognosis signature, as compared with the group with the good-prognosis signature, was 5.1 (95 percent confidence interval, 2.9 to 9.0; P<0.001). This ratio remained significant when the groups were analyzed according to lymph-node status. Multivariable Cox regression analysis showed that the prognosis profile was a strong independent factor in predicting disease outcome. The gene-expression profile we studied is a more powerful predictor of the outcome of disease in young patients with breast cancer than standard systems based on clinical and histologic criteria.

Expression profiling predicts outcome in breast cancer

Article

Full-text available

Feb 2003

Gruvberger et al. postulate, in their commentary published in this issue of Breast Cancer Research, that our “prognostic gene set may not be broadly applicable to other breast tumor cohorts”, and they suggest that “it may be important to define prognostic expression profiles separately in estrogen receptor (ER) positive and negative tumors”. This is based on two observations derived from our gene expression profiling data in breast cancer: the overlap between reporter genes for prognosis and ER status, and Gruvberger et al.’s inability to confirm the prognosis prediction using a nonoptimal selection of 58 of our 231 prognosis reporter genes.

KEGG for representation and analysis of molecular networks involving diseases and drugs

Article

Oct 2009

Minoru Kanehisa

Twenty years ago the Human Genome Project was initiated aiming to uncover the genetic factors of human diseases and to develop new strategies for diagnosis, treatment, and prevention. Despite the successful sequencing of the human genome and the discovery of many disease related genes, our understanding of molecular mechanisms is still largely incomplete for the majority of diseases. In the KEGG database project we have been organizing our knowledge on cellular functions and organism behaviors in computable forms, especially in the forms of molecular networks (KEGG pathway maps) and hierarchical lists (BRITE functional hierarchies). The computerized knowledge has been widely used as a reference for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies. Our efforts are now focused on human diseases and drugs. We consider diseases as perturbed states of the molecular system that operates the cell and the organism, and drugs as perturbants to the molecular system. Since the existing disease databases are mostly for humans to read and understand, we develop a more computable disease information resource where our knowledge on diseases is represented as molecular networks or gene/molecule lists. When the detail of the molecular system is relatively well characterized, we use the molecular network representation and draw KEGG pathway maps. The Human Diseases category of the KEGG PATHWAY database contains about 40 pathway maps for cancers, immune disorders, neurodegenerative diseases, etc. When the detail is not known but disease genes are identified, we use the gene/molecule list representation and create a KEGG DISEASE entry. The entry contains a list of known disease genes and other relevant molecules including environmental factors, diagnostic markers, and therapeutic drugs. The list simply defines the membership to the underlying molecular system, but is still useful for computational analysis. In the KEGG DRUG database we capture knowledge on two types of molecular networks. One is the interaction network of drugs with target molecules, metabolizing enzymes, transporters, other drugs, and the pathways involving all these molecules. The other is the chemical structure transformation network representing the biosynthetic pathways of natural products in various organisms, as well as the history of drug development where drug structures have been continuously modified by medicinal chemists. KEGG DRUG contains chemical structures and/or chemical components of all prescription and OTC drugs in Japan including crude drugs and TCM (Traditional Chinese Medicine) formulas, as well as most prescription drugs in USA and many prescription drugs in Europe. I will report on our strategy to analyze the chemical architecture of natural products derived from enzymatic reactions (and enzyme genes) and the chemical architecture of marketed drugs derived from human made organic reactions in the history of drug development, towards drug discovery from the genomes of plants and microorganisms.

Derivation of molecular signatures for breast cancer recurrence prediction using a two-way validation approach

Article

Apr 2009
BREAST CANCER RES TR

Previous studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical breast cancer recurrence, however, many of these predictive models have been derived using simple computational algorithms and validated internally or using one-way validation on a single dataset. We have recently developed a new feature selection algorithm that overcomes some limitations inherent to high-dimensional data analysis. In this study, we applied this algorithm to two publicly available gene expression datasets obtained from over 400 patients with breast cancer to investigate whether we could derive more accurate prognostic signatures and reveal common predictive factors across independent datasets. We compared the performance of three advanced computational algorithms using a robust two-way validation method, where one dataset was used for training and to establish a prediction model that was then blindly tested on the other dataset. The experiment was then repeated in the reverse direction. Analyses identified prognostic signatures that while comprised of only 10-13 genes, significantly outperformed previously reported signatures for breast cancer evaluation. The cross-validation approach revealed CEGP1 and PRAME as major candidates for breast cancer biomarker development.

Relationship Between Tumor Location and Relapse in 6,781 Women With Early Invasive Breast Cancer

Article

Sep 2000

To explore the independent prognostic impact of medial hemisphere tumor location in early breast cancer. A comprehensive database was used to review patients referred to the British Columbia Cancer Agency from 1989 to 1995 with early breast cancer. Patients were grouped according to relapse risk (high or nonhigh) and adjuvant systemic therapy received. Multiple regression analysis was used to determine whether the significance of primary tumor location (medial v lateral hemisphere) was independent of known prognostic factors and treatment. In the adjuvant systemic therapy groups, medial location was associated with a 50% excess risk of systemic relapse and breast cancer death compared with lateral location. Five-year systemic disease-free survival rates were 66.3% and 74.2% for high-risk medial and lateral lesions, respectively (P <.005). Corresponding 5-year disease-specific survival rates were 75.7% and 80.8%, respectively (P <.03). No significant differences were observed between medial and lateral location for low-risk disease regardless of adjuvant therapy or for high-risk disease with no adjuvant therapy. Local recurrence rates were similar for all risk and therapy groups. The two-fold risk of relapse and breast cancer death associated with high-risk medial breast tumors may be due to occult spread to internal mammary nodes (IMNs). Enhanced local control, such as with irradiation of the IMN chain, may be one way to reduce the excess risk. Ongoing randomized controlled trials may provide prospective answers to the question of the optimal volume of radiotherapy.

''Good Old'' clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers q

Article

Sep 2004
EUR J CANCER

We compared the power of gene expression measurements with that of conventional prognostic markers, i.e., clinical, histopathological, and cell biological parameters, for predicting distant metastases in breast cancer patients using both established prognostic indices (e.g., the Nottingham Prognostic Index (NPI)) and novel combinations of conventional markers. We used publicly available data on 97 patients, and the performance of metastasis prediction was represented by receiver operating characteristic (ROC) areas and Kaplan-Meier plots. The gene expression profiler did not perform noticeably better than indices constructed from the clinical variables, e.g., the well established NPI. When analysing separately subgroups, according to the oestrogen receptor (ER) status both approaches could predict clinical outcome more easily for the ER-positive than for the ER-negative cohort. Given the time it may take before microarray processing is used worldwide, particularly due to the costs and the lack of standards, it is important to pursue research using conventional markers. Our analysis suggests that it might be possible to improve the combination of different conventional prognostic markers into one prognostic index.

Which Cyclin E Prevails as Prognostic Marker for Breast Cancer? Results from a Retrospective Study Involving 635 Lymph Node-Negative Breast Cancer Patients

Article

Jun 2006

To evaluate the prognostic value of cyclin E with a quantitative method for lymph node-negative primary breast cancer patients. mRNA transcripts of full-length and splice variants of cyclin E1 (CCNE1) and cyclin E2 (CCNE2) were measured by real-time PCR in frozen tumor samples from 635 lymph node-negative breast cancer patients who had not received neoadjuvant or adjuvant systemic therapy. None of the PCR assays designed for the specific splice variants of the cyclins gave additional prognosis-related information compared with the common assays able to detect all variants. In Cox multivariate analysis, corrected for the traditional prognostic factors, high levels of cyclin E were independently associated with a short distant metastasis-free survival [hazard ratio (HR), 3.40; P < 0.001 for CCNE1 and HR, 1.76; P < 0.001 for CCNE2, respectively]. After dichotomizing the tumors at the median level of 70% tumor cells, the multivariate analysis showed particularly strong results for CCNE1 in the group of 433 patients with stroma-enriched primary tumors (HR, 5.12; P < 0.001). In these tumors, the worst prognosis was found for patients with estrogen receptor-negative tumors expressing high CCNE1 (HR, 9.89; P < 0.001) and for patients with small (T1) tumors expressing high CCNE1 (HR, 8.47; P < 0.001). Our study shows that both CCNE1 and CCNE2 qualify as independent prognostic markers for lymph node-negative breast cancer patients, and that CCNE1 may provide additional information for specific subgroups of patients.

Synergy network based inference for breast cancer metastasis

Abstract and Figures

Recommended publications

Genetic characteristics and clinical outcomes of pediatric acute myeloid leukemia with NUP98-NSD1 fu...

Single Binding Pockets Versus Allosteric Binding

Mapping of the Locus for Autosomal Dominant Amelogenesis Imperfecta (AIH2) to a 4-Mb YAC Contig on C...

[Clinical and genetic markers of chronic cerebral ischemia.]