The IARC classified arsenic (As) as “carcinogenic to humans.” Despite the health consequences of arsenic exposure, there is no molecular signature available yet that can predict when exposure may lead to the development of disease. To understand the molecular processes underlying arsenic exposure and the risk of disease development, this study investigated the functional relationship between high arsenic exposure and disease risk using gene expression derived from human exposure. In this study, a three step analysis was employed: (1) the gene expression profiles obtained from two diverse arsenic-exposed Asian populations were utilized to identify differentially expressed genes associated with arsenic exposure in human subjects, (2) the gene expression profiles induced by arsenic exposure in four different myeloma cancer cell lines were used to define common genes and pathways altered by arsenic exposure, and (3) the genetic profiles of two publicly available human bladder cancer studies were used to test the significance of the common association of genes, identified in step 1 and step 2, to develop and validate a predictive model of primary bladder cancer risk associated with arsenic exposure. Our analysis shows that arsenic exposure to humans is mainly associated with organismal injury and abnormalities, immunological disease, inflammatory disease, gastrointestinal disease, and increased rates of a wide variety of cancers. In addition, arsenic exerts its toxicity by generating reactive oxygen species (ROS) and increasing ROS production causing the imbalance that leads to cell and tissue damage (oxidative stress). Oxidative stress activates inflammatory pathways leading to transformation of a normal cell to tumor cell specifically; there is significant evidence of the advancing changes in oxidative/nitrative stress during the progression of bladder cancer. Therefore, we examined the relation of differentially expressed genes due to exposure of arsenic in human and bladder cancer and developed a bladder cancer risk prediction model. In this study, integrin-linked kinase (ILK) was one of the most significant pathways identified between both arsenic exposed population which plays a key role in eliciting a protective response to oxidative damage in epidermal cells. On the other hand, several studies showed that arsenic trioxide (ATO) is useful for anticancer therapy although the mechanisms underlying its paradoxical effects are still not well understood. ATO has shown remarkable efficacy for the treatment of multiple myeloma; therefore, it will be helpful to understand the underlying cancer biology by which ATO exerts its inhibitory effect on the myeloma cells. Our study found that MAPK is one of the most active network between arsenic gene and ATO cell line which is involved in indicative of oxidative/nitrosative damage and well associated with the development of bladder cancer. The study identified a unique set of 147 genes associated with arsenic exposure and linked to molecular mechanisms of cancer. The risk prediction model shows the highest prediction ability for recurrent bladder tumors based on a very small subset (NKIRAS2, AKTIP, and HLA-DQA1) of the 147 genes resulting in AUC of 0.94 (95% CI: 0.744-0.995) and 0.75 (95% CI: 0.343-0.933) on training and validation data, respectively.
1. Introduction
Arsenic (As) is a ubiquitous element in the environment, ranked the 20th most abundant element on earth. The toxic impact of arsenic on human health has been documented in numerous studies leading to arsenic identification as a known carcinogen by the International Agency Research on Carcinogens (IARC), the National Toxicity Program (NTP), and the United States Environmental Protection Agency (EPA) [1, 2]. In addition to cancer, long-term exposure to arsenic has been associated with developmental effects, cardiovascular disease, neurotoxicity, and diabetes (WHO, https://www.who.int/news-room/fact-sheets/detail/As). Typically, arsenic would only be found in background levels in soil and groundwater. However, high levels of arsenic accumulate in these medians from anthropogenic activities such as indiscriminate waste disposal from mining, milling, and smelting of ores [3], raw and spent oil shale [4], and coal fly ash amendments [5]. The usage pattern in the 1960s for arsenic compounds in the United States was 77% pesticides, 18% as glass, and 4% industrial chemicals. The past use of arsenic as a pesticide in agriculture is exemplified by New Jersey, where between 1900 and 1960, it is estimated that approximately 15 million pounds of arsenic were applied to New Jersey soils alone [6]. Leaching of arsenic from soils into the water supply has now resulted in the significant contamination of drinking water in many areas of the United States and the world. This past usage of arsenic in anthropogenic activities has now resulted in exposure to arsenic being a global public health problem [7–9]. This is illustrated by the fact that over 120 million people are affected by arsenic exposure, many of which reside in Bangladesh and India [8, 10]. A recent study has modeled the role of atmospheric exposure to arsenic as being additive to overall exposure levels [11]. Despite the health consequences of arsenic exposure, there is no molecular signature that might predict the risk of developing cancer or other diseases following exposure to arsenic.
On the other hand, the use of arsenicals as therapeutic agents in medicine is very well known dating back more than 2400 years to ancient Greece and Rome [12]. In the 19th century, potassium arsenite was used to treat different types of disease [13] including diabetes, psoriasis, syphilis, skin ulcers, and joint diseases. More recently, phase I/II trials have been conducted in heavily pretreated patients with relapsed or refractory multiple myeloma shows arsenic trioxide (ATO) is the most active, single agent in acute promyelocytic leukemia (multiple myeloma: types of blood cancers) [14]. Another study suggested that ATO can be used as an effective alternative therapeutic for the treatment of retinoblastoma which is the most common intraocular cancer in children [15]. The study shows an antitumor activity of arsenic which mainly targets multiple pathways in malignant cells, resulting in the promotion of differentiation or in the induction of apoptosis, which would be very helpful to understand the molecular mechanism of arsenic-exposed cancer biology as a reverse engineering approach.
Biomarkers are classified based on exposure, effect, and susceptibility [16]. For arsenic, biomarkers of exposure have received the greatest attention and success in defining individual exposures [17]. Human susceptibility to arsenic, especially as it applies to predicting disease states, is probably the least studied area of biomarkers. A few biomarkers of interest attracting study include clastogenicity in peripheral lymphocytes, micronuclei in oral mucosa and bladder cells, and induction of heme oxygenase [16, 18, 19]. The goal of the present study was to identify differentially expressed genes in arsenic exposed humans and determine if a molecular signature could be developed that would stratify and predict the risk of urothelial cancer for those with known exposure to arsenic. Urothelial cancer, which is the most common type of bladder cancer, was chosen as an initial proof of principle since epidemiological, and other evidence is strong for the link between arsenic and the development of urothelial cancer, and there are publicly available databases for data mining [7, 20–24]. A theme of such studies shows a strong association at more extreme levels (>150 μg/L) whereas there is uncertainty of health effects that may develop below this threshold. Suggested mechanisms for arsenic carcinogenesis include oxidative damage, epigenetic effects, and interference with DNA repair. In addition, the development of bladder cancer is known to have a strong association with environmental exposures from mentioned anthropogenic activities [25]. Overproduction of reactive oxygen species (ROS) due to arsenic exposure primarily follows direct toxicity or the metabolic processes of arsenic products. Inhibiting succinic dehydrogenase activity in mitochondrial complexes I and III in electron transport chain produces superoxide radical anion, while monomethylarsonic acid (MMA) and dimethylarsinic acid (DMA) will form radicals in the cell and specifically the endoplasmic reticulum [26, 27]. Since inorganic arsenic compounds tend to be more toxic than organic, ATO is of interest for its global concern along with its involvement in oxidative and nitrosative stress properties. Translational damage from reactive species can regulate MAPK family or induce extended states of inflammation, genetic, and epigenetic mechanisms such as these are indicative of oxidative/nitrosative damage and well associated with the development of bladder cancer [28–30]. ILK signaling and neuroinflammation signaling pathway were the most frequent pathways affected by the exposure of arsenic, and both of them are highly associated with oxidative stress. Oxidative stress and neuroinflammation could potentiate each other to promote progression of mental disorders [31], whereas ILK plays a complex roles in the modulation of oxidant species production [32].
The strategy used in the present study involved three steps. The first step was a blood cell gene expression analysis of two diverse human populations with known levels of exposure to arsenic. One population was stratified to low and high exposure, and the second population to low, medium, and high exposure with correlation to human global gene expression. After identifying statistically significant genes unique to the mentioned test conditions, we found that cancer was the most significant disease and lipid metabolism (which is considered as a major metabolic pathway involved in the progression of cancer) was most significant molecular and cellular functions associated with genes differentially expressed due to different levels of arsenic. Therefore, the next stage was to compare it with data from four independent myeloma cell lines that had been treated with As trioxide (ATO) to understand the molecular mechanism of cancer. Many of the genes that were up- and downregulated due to arsenic exposure are associated with cancer biology. There gene lists were then subjected to enrichment analysis to identify statistically significant pathways and further scrutinized for functional relevance. The third step was to develop a model by examining the ability of the most significant genes to predict the progression and possible development of bladder cancer using publicly available patient biopsy samples. Using this approach, we developed a robust regression model of three significant probes and corresponding gene results with AUC of 0.94 (95% CI: 0.744-0.995) and 0.75 AUC (95% CI: 0.343-0.933) on the training and validation data, respectively. The most significant pathway identified is integrin-linked kinase (ILK) which plays a key role in eliciting a protective response to oxidative damage in epidermal cells [32].
2. Materials and Methods
2.1. Data
Two publicly available gene expression datasets of previously conducted experiments were accessed from two independent populations. The set from Bangladesh (Gene Expression Omnibus GEO ID: GSE57711) had 29 individuals; 16 were males, and 13 were females. The second dataset was from Pakistan (GSE110852 ID) and had 57 individuals composed of 31 males and 26 females. In this report, the set from Bangladesh is denoted as Data1 and that from Pakistan is Data2 and remains unchanged from their original, respective studies. Data1 samples were part of a clinical trial in June 2011 [33]. For these samples, “low” exposure levels correlate to a range of 50-200 μg/L, whereas “high” levels correlate to a range from 232 to 1000 μg/L (there were no samples collected from patients exposed in the range of 201-231 μg/L). Data2 samples were from two main districts of rural Pakistan, Lahore, and Kasur. The study is aimed at investigating the blood transcriptome profile among the exposed samples to correlate gene expression to exposure levels of As [34]. Urine sampling was used to define levels of arsenic exposure, with “low” being 0-50 μg/g creatinine, “medium” as 51-100 μg/g creatinine, and “high” as >101 μg/g creatinine. The general characteristics of both data sets are detailed in Table 1. The results from 4 multiple myeloma cell lines treated with ATO were obtained from the GEO database, series GSE14519 [35]. These cell lines U266, MM1S, KMS11, and 8226S were exposed to ATO for 6 hr, 28 hr, and 48 hr before analysis. Gene expression profiling was used to determine differences in cell line response to ATO. This study was used as a reference point in the present study since it documents the effects of arsenic compounds on gene expression at different exposure levels.
Total samples
Gender
Low exposure
Medium exposure
High exposure
Males
Females
Data1
Water As 50–200 (μg/L)
—
Water As 232–1000 (μg/L)
GSE57711
29
16
13
15
—
14
Data2
Water As (μg/L)
Water As (μg/L)
Water As (μg/L)
GSE110852
57
31
26
18
19
20