Figure 3 - uploaded by Ben Evans
Content may be subject to copyright.
An example of how poor performance of a few probes in the non-target species can affect the rank of many genes, even ones that perform equally in both species. Ten genes (a, b, c, d, e, f, g, h, i, and j) are ranked according to their expression intensity. In the non- target species, probes directed against genes e, h, and j perform poorly and have a low rank in the non-target species due to sequence divergence, even though there actually is no expression divergence. This elevates the rank of many other genes, causing an overall negative median rank difference (RD) and a positively skew in the RD distribution. In this example, significantly upregulated genes in the target species tend to have a higher average rank in this species (9) than the significantly upregulated genes in the non-target species do in that species (6.5). Significantly upregulated genes in the target species have a lower average rank in the non-target species (3) than the significantly upregulated genes in the non- target species do in the target species (3.5). doi:10.1371/journal.pone.0003279.g003 

An example of how poor performance of a few probes in the non-target species can affect the rank of many genes, even ones that perform equally in both species. Ten genes (a, b, c, d, e, f, g, h, i, and j) are ranked according to their expression intensity. In the non- target species, probes directed against genes e, h, and j perform poorly and have a low rank in the non-target species due to sequence divergence, even though there actually is no expression divergence. This elevates the rank of many other genes, causing an overall negative median rank difference (RD) and a positively skew in the RD distribution. In this example, significantly upregulated genes in the target species tend to have a higher average rank in this species (9) than the significantly upregulated genes in the non-target species do in that species (6.5). Significantly upregulated genes in the target species have a lower average rank in the non-target species (3) than the significantly upregulated genes in the non- target species do in the target species (3.5). doi:10.1371/journal.pone.0003279.g003 

Source publication
Article
Full-text available
Prefabricated expression microarrays are currently available for only a few species but methods have been proposed to extend their application to comparisons between divergent genomes. Here we demonstrate that the hybridization intensity of genomic DNA is a poor basis on which to select unbiased probes on Affymetrix expression arrays for studies of...

Contexts in source publication

Context 1
... XB, and XM; as detailed below, these results are qualitatively similar to those recovered with the gDNA probemask of [17,18]. Probes that perfectly match sequences from XL and XB have a wide range of XB/XL gDNA ratios (Fig. 2A). Under a best-case scenario, this indicates that using the gDNA ratio as a criterion for probe retention will not retain all perfect match probes. But we also found that other probes that we know mismatch both paralogs of genes in XB have a range of XB/XL gDNA ratios that overlaps extensively with the gDNA ratios of probes that perfectly match both species (Fig. 2B). This point is also illustrated by examination of four probesets on the Xenopus laevis Affymetrix microarray that were designed to interrogate XB transcripts: XlAffx.1.5.S1_at, XlAffx.1.9.S1_at, XlAffx.1.10.S1_at, and XlAffx.1.12.S1_at. Sixty out of the 64 probes in these four probesets do not perfectly match XL, and these also have a broad range of gDNA ratios (Fig. 2A). Together these observations indicate that gDNA ratios provide a poor basis for selection of perfect match probes in non-target species on the Affymetrix GeneChip H Xenopus laevis Genome Array. In addition to not retaining many probes that perfectly match both species, this approach almost certainly results in the retention of probes that do not perfectly match the non-target species. When testis expression data from XL, XB, and H XLXB are analyzed using our XB/XL gDNA probemask or using our XM/ XL gDNA probemask, we recover similar results to the analysis of testis expression data from XL, XM, and H XLXM by Malone et al. [17]. This suggests that evolutionary differences between XB and XM, possible differences in the geographic origin of XL animals, and variation in laboratory procedures associated with microarray hybridizations together had a much smaller impact on the results than did the type of probe mask used in the analysis. More specifically, in this analysis the asymmetry in expression divergence is significant and more substantial than results from the XB + XL perfect match probemask such that expression in the hybrid appears much more similar to the target than the non- target species (Table 1). This is because using a gDNA probemask instead of a perfect match probemask results in a significantly lower proportion of genes that are divergently expressed in the comparison between XL and H XLXB and a significantly higher proportion of genes that are divergently expressed between XB and H XLXB (P # 0.002 for both comparisons). We explored alternative analytical approaches including invariant set (IS) normalization [28] and the probe logarithmic intensity error (PLIER) method for calculating signal intensity [29]. These procedures produce results that are qualitatively similar to those recovered with RMA normalization with each probemask. The asymmetry in divergent expression in testis between each parental species and the hybrid with the XB + XL perfect match probemask is of similar magnitude in each of these analyses (1.34, 1.45 and 1.39 for RMA, IS, and PLIER, respectively). Likewise, more than twice as much asymmetry in divergent expression in testis is recovered when RMA, IS, or PLIER normalization are used with gDNA probemasks (i.e. there are more divergently expressed genes between the non-target species and the hybrid than between XL and the hybrid with these probemasks; data not shown). Thus we conclude that the method of normalization also does not account for the substantial differences in results that are obtained from perfect match versus gDNA probemasks. The nature of the discrepancy between results obtained from these different probemasks is further illuminated by consideration of some of the technical aspects of the analysis. When microarray data are normalized it is generally assumed that the overall distribution of expression intensities within each treatment is similar [30–32]. Moreover, most normalization methods were developed for comparisons between treatments with expression divergence at only a few genes [33]. When data are normalized with the quantile method [30], for example, which was used in this study and in [16–18], the expression intensities of each probe are ranked and replaced by the average intensity of each quantile (each rank). This procedure yields identical distributions of overall expression intensities across treatments, even if they were very different to begin with. If the overall distribution of expression intensities was similar in each treatment before normalization, it is reasonable to expect that the magnitude and direction of expression divergence should be unbiased – that for a given magnitude of expression divergence, a similar number of genes will be upregulated in one treatment as is upregulated in the other. To test this, we calculated the difference in expression rank for each gene included in the analysis, with the lowest rank corresponding to the gene with the lowest expression as depicted in Fig. 3. Additionally, the skew of this distribution was quantified by the Pearson skewness coefficient ( = 3*(mean-median)/standard deviation). Departure of the observed median rank difference and skew of the distribution of rank differences from the null hypotheses of a median and skew of zero was assessed by comparison to a null distribution generated from 1000 randomized ranks using scripts written in PERL. When interspecific data from the target species and a non-target species were analyzed using a gDNA probemask, the median rank difference was negative and this median departed significantly and substantially from zero (Table 2). The skew of the distribution of rank differences was significantly and substantially positive in these interspecific comparisons (Table 2). While these metrics are not independent because the median is used in the calculation of skew, they provide qualitative information about the rank difference distributions in these analyses. Because we calculated the rank difference by subtracting the non-target rank from the target rank, a negative median indicates that the non-target sequences tend to be upregulated to a greater degree than do the target sequences. A positive skew of this distribution (Table 2) indicates a tail on the right, suggesting that some probesets have a much higher rank (higher expression) in XL but not the reverse. In contrast, when intraspecific comparisons were analyzed with gDNA probemasks, the median and skew never departed as substantially from the null expectation as the interspecific comparisons between a target and non-target species, although occasionally the intraspecific departure was significant (Table 2). When the XB + XL perfect match probemask was used in the analysis, the median and skew were not significantly different from the null expectation (Table 3). While occasional departure from the null in some intraspecific comparisons between different XL tissues probably has a biological basis and could also stem from variation between laboratories in microarray protocol, these comparisons suggest that the substantially negative median and positive skew of the rank difference in interspecies comparisons analyzed with gDNA probemasks has a technical rather than a biological basis. When gDNA probemasks are used, we suspected that differential performance of some probesets in the non-target species could cause a spurious signal of upregulation and downregulation compared to another species (Fig. 3). One class of significantly differently expressed genes – those that appear to be upregulated in the target species (XL) – could result when probes hybridize poorly to transcripts of the non-target species. The other class of significantly differently expressed genes – those that appear to be upregulated in the non-target species (XB or XM) – could result when the ranks of some genes in the non-target species are elevated as a result of the other genes that are interrogated by biased probes having a lower rank (Fig. 3). A key difference between these two classes of divergently expressed genes is that a larger proportion of the genes that appear upregulated in XL are interrogated by probes with differential performance (bias) between species. In analyses with a gDNA probemask, therefore, we predicted that the expression rank of genes that appear to be significantly upregulated in the non-target species would be highly correlated with the expression rank of these genes in the target species. We expected this correlation to be much higher than the correlation between the ranks of genes upregulated in the target species and the rank of these same genes in the non-target species. To test this, we calculated the Spearman’s rank correlation (SRC) of the rank in each treatment of (i) genes upregulated in the non-target species and (ii) genes upregulated in the target species. Under our hypothesis that many of the genes that are upregulated in the non-target species are false positives, we expected that the SRC would be much higher in (i) than in (ii). To quantify this expectation, we calculated the absolute value of the difference in the SRC in (i) and (ii) for the interspecies comparisons, and we refer to this difference as d SRC. For comparative purposes, d SRC was calculated for interspecific comparisons between XL and a non-target species, comparisons between each species and a hybrid, and intraspecific comparisons between different tissues of XL, and this was performed for analyses with each type of probemask. The data support our expectation. When the XB/XL gDNA probemask or the XM/XL gDNA probemask are used in interspecific comparisons, the d SRC of the rank of genes upregulated in the non-target species is substantially higher than that of genes upregulated in the target species or in hybrids (Table 2). When comparisons were made between tissue types in XL or within a tissue type of XL and a hybrid using these gDNA probemasks, extreme differences ...
Context 2
... these analyses (1.34, 1.45 and 1.39 for RMA, IS, and PLIER, respectively). Likewise, more than twice as much asymmetry in divergent expression in testis is recovered when RMA, IS, or PLIER normalization are used with gDNA probemasks (i.e. there are more divergently expressed genes between the non-target species and the hybrid than between XL and the hybrid with these probemasks; data not shown). Thus we conclude that the method of normalization also does not account for the substantial differences in results that are obtained from perfect match versus gDNA probemasks. The nature of the discrepancy between results obtained from these different probemasks is further illuminated by consideration of some of the technical aspects of the analysis. When microarray data are normalized it is generally assumed that the overall distribution of expression intensities within each treatment is similar [30–32]. Moreover, most normalization methods were developed for comparisons between treatments with expression divergence at only a few genes [33]. When data are normalized with the quantile method [30], for example, which was used in this study and in [16–18], the expression intensities of each probe are ranked and replaced by the average intensity of each quantile (each rank). This procedure yields identical distributions of overall expression intensities across treatments, even if they were very different to begin with. If the overall distribution of expression intensities was similar in each treatment before normalization, it is reasonable to expect that the magnitude and direction of expression divergence should be unbiased – that for a given magnitude of expression divergence, a similar number of genes will be upregulated in one treatment as is upregulated in the other. To test this, we calculated the difference in expression rank for each gene included in the analysis, with the lowest rank corresponding to the gene with the lowest expression as depicted in Fig. 3. Additionally, the skew of this distribution was quantified by the Pearson skewness coefficient ( = 3*(mean-median)/standard deviation). Departure of the observed median rank difference and skew of the distribution of rank differences from the null hypotheses of a median and skew of zero was assessed by comparison to a null distribution generated from 1000 randomized ranks using scripts written in PERL. When interspecific data from the target species and a non-target species were analyzed using a gDNA probemask, the median rank difference was negative and this median departed significantly and substantially from zero (Table 2). The skew of the distribution of rank differences was significantly and substantially positive in these interspecific comparisons (Table 2). While these metrics are not independent because the median is used in the calculation of skew, they provide qualitative information about the rank difference distributions in these analyses. Because we calculated the rank difference by subtracting the non-target rank from the target rank, a negative median indicates that the non-target sequences tend to be upregulated to a greater degree than do the target sequences. A positive skew of this distribution (Table 2) indicates a tail on the right, suggesting that some probesets have a much higher rank (higher expression) in XL but not the reverse. In contrast, when intraspecific comparisons were analyzed with gDNA probemasks, the median and skew never departed as substantially from the null expectation as the interspecific comparisons between a target and non-target species, although occasionally the intraspecific departure was significant (Table 2). When the XB + XL perfect match probemask was used in the analysis, the median and skew were not significantly different from the null expectation (Table 3). While occasional departure from the null in some intraspecific comparisons between different XL tissues probably has a biological basis and could also stem from variation between laboratories in microarray protocol, these comparisons suggest that the substantially negative median and positive skew of the rank difference in interspecies comparisons analyzed with gDNA probemasks has a technical rather than a biological basis. When gDNA probemasks are used, we suspected that differential performance of some probesets in the non-target species could cause a spurious signal of upregulation and downregulation compared to another species (Fig. 3). One class of significantly differently expressed genes – those that appear to be upregulated in the target species (XL) – could result when probes hybridize poorly to transcripts of the non-target species. The other class of significantly differently expressed genes – those that appear to be upregulated in the non-target species (XB or XM) – could result when the ranks of some genes in the non-target species are elevated as a result of the other genes that are interrogated by biased probes having a lower rank (Fig. 3). A key difference between these two classes of divergently expressed genes is that a larger proportion of the genes that appear upregulated in XL are interrogated by probes with differential performance (bias) between species. In analyses with a gDNA probemask, therefore, we predicted that the expression rank of genes that appear to be significantly upregulated in the non-target species would be highly correlated with the expression rank of these genes in the target species. We expected this correlation to be much higher than the correlation between the ranks of genes upregulated in the target species and the rank of these same genes in the non-target species. To test this, we calculated the Spearman’s rank correlation (SRC) of the rank in each treatment of (i) genes upregulated in the non-target species and (ii) genes upregulated in the target species. Under our hypothesis that many of the genes that are upregulated in the non-target species are false positives, we expected that the SRC would be much higher in (i) than in (ii). To quantify this expectation, we calculated the absolute value of the difference in the SRC in (i) and (ii) for the interspecies comparisons, and we refer to this difference as d SRC. For comparative purposes, d SRC was calculated for interspecific comparisons between XL and a non-target species, comparisons between each species and a hybrid, and intraspecific comparisons between different tissues of XL, and this was performed for analyses with each type of probemask. The data support our expectation. When the XB/XL gDNA probemask or the XM/XL gDNA probemask are used in interspecific comparisons, the d SRC of the rank of genes upregulated in the non-target species is substantially higher than that of genes upregulated in the target species or in hybrids (Table 2). When comparisons were made between tissue types in XL or within a tissue type of XL and a hybrid using these gDNA probemasks, extreme differences between d SRC of each of these classes of genes were not observed (Table 2). A high d SRC was not observed in any of the analyses with the XB + XL perfect match probemask (Table 3). Furthermore, we found other signs of technical bias in results generated with gDNA probemasks, but not the XB + XL perfect match probemask, by comparing the mean rank of significantly upregulated genes (Supporting Information S1, Table S1, Table S2). Taken together, these observations are consistent with the notion that the use of probemasks based on gDNA ratios on the Affymetrix GeneChip H Xenopus laevis Genome Array produces spurious results when comparisons are made directly between species or between a non-target species and a hybrid, irrespective of tissue type. When gDNA probemasks are used, many of the genes that are putatively upregulated in the non-target species are actually false positives whose high ranks are an artifact of the low ranks of poorly performing probesets. Of course, this group of genes may include some genes that are not false positives, but it is not clear which ones these are. We suspect then, albeit with caveats discussed below, that our analysis with the XB + XL perfect match probemask is a closer approximation of biological variation than that recovered by [17,18]. A challenge to the implementation of single-species microarrays in comparative transcriptomics is the identification of unbiased probes. Due to differences from the target species, such as sequence divergence, non-target transcripts will exhibit a range of probe hybridization efficiencies that cause technical variation in hybridization intensities. In comparative analyses, normalization may overcompensate for genes with lower than average divergence and undercompensate for genes with higher than average divergence [34]. Exacerbating this problem, our analysis of confirmed perfect match probes in a target and a non-target species illustrates that the gDNA ratio is an unreliable metric with which to identify unbiased probes on the Affymetrix GeneChip H Xenopus laevis Genome Array. This approach selects probes with low gDNA intensity (Fig. 1), misses probes that do perfectly match both species (Fig. 2A), and includes probes that do not perfectly match both species (Fig. 2B). The implications of this are large and affect fundamental conclusions of the analysis, such as which and how many genes are significantly or not significantly differently expressed. Notably, our analyses suggest that including biased probes in a microarray analysis leads not only to spurious results from these biased probes, but affects conclusions drawn from probes that are interrogated by probes that perform equally well in both species. We anticipate, therefore, that comparisons between species using probes that are selected by gDNA ratios, including the comparison between XB or XM and XL that are presented here, are characterized by a high level of false positives as well as false negatives. Many of the genes from this type ...
Context 3
... H XLXB (P # 0.002 for both comparisons). We explored alternative analytical approaches including invariant set (IS) normalization [28] and the probe logarithmic intensity error (PLIER) method for calculating signal intensity [29]. These procedures produce results that are qualitatively similar to those recovered with RMA normalization with each probemask. The asymmetry in divergent expression in testis between each parental species and the hybrid with the XB + XL perfect match probemask is of similar magnitude in each of these analyses (1.34, 1.45 and 1.39 for RMA, IS, and PLIER, respectively). Likewise, more than twice as much asymmetry in divergent expression in testis is recovered when RMA, IS, or PLIER normalization are used with gDNA probemasks (i.e. there are more divergently expressed genes between the non-target species and the hybrid than between XL and the hybrid with these probemasks; data not shown). Thus we conclude that the method of normalization also does not account for the substantial differences in results that are obtained from perfect match versus gDNA probemasks. The nature of the discrepancy between results obtained from these different probemasks is further illuminated by consideration of some of the technical aspects of the analysis. When microarray data are normalized it is generally assumed that the overall distribution of expression intensities within each treatment is similar [30–32]. Moreover, most normalization methods were developed for comparisons between treatments with expression divergence at only a few genes [33]. When data are normalized with the quantile method [30], for example, which was used in this study and in [16–18], the expression intensities of each probe are ranked and replaced by the average intensity of each quantile (each rank). This procedure yields identical distributions of overall expression intensities across treatments, even if they were very different to begin with. If the overall distribution of expression intensities was similar in each treatment before normalization, it is reasonable to expect that the magnitude and direction of expression divergence should be unbiased – that for a given magnitude of expression divergence, a similar number of genes will be upregulated in one treatment as is upregulated in the other. To test this, we calculated the difference in expression rank for each gene included in the analysis, with the lowest rank corresponding to the gene with the lowest expression as depicted in Fig. 3. Additionally, the skew of this distribution was quantified by the Pearson skewness coefficient ( = 3*(mean-median)/standard deviation). Departure of the observed median rank difference and skew of the distribution of rank differences from the null hypotheses of a median and skew of zero was assessed by comparison to a null distribution generated from 1000 randomized ranks using scripts written in PERL. When interspecific data from the target species and a non-target species were analyzed using a gDNA probemask, the median rank difference was negative and this median departed significantly and substantially from zero (Table 2). The skew of the distribution of rank differences was significantly and substantially positive in these interspecific comparisons (Table 2). While these metrics are not independent because the median is used in the calculation of skew, they provide qualitative information about the rank difference distributions in these analyses. Because we calculated the rank difference by subtracting the non-target rank from the target rank, a negative median indicates that the non-target sequences tend to be upregulated to a greater degree than do the target sequences. A positive skew of this distribution (Table 2) indicates a tail on the right, suggesting that some probesets have a much higher rank (higher expression) in XL but not the reverse. In contrast, when intraspecific comparisons were analyzed with gDNA probemasks, the median and skew never departed as substantially from the null expectation as the interspecific comparisons between a target and non-target species, although occasionally the intraspecific departure was significant (Table 2). When the XB + XL perfect match probemask was used in the analysis, the median and skew were not significantly different from the null expectation (Table 3). While occasional departure from the null in some intraspecific comparisons between different XL tissues probably has a biological basis and could also stem from variation between laboratories in microarray protocol, these comparisons suggest that the substantially negative median and positive skew of the rank difference in interspecies comparisons analyzed with gDNA probemasks has a technical rather than a biological basis. When gDNA probemasks are used, we suspected that differential performance of some probesets in the non-target species could cause a spurious signal of upregulation and downregulation compared to another species (Fig. 3). One class of significantly differently expressed genes – those that appear to be upregulated in the target species (XL) – could result when probes hybridize poorly to transcripts of the non-target species. The other class of significantly differently expressed genes – those that appear to be upregulated in the non-target species (XB or XM) – could result when the ranks of some genes in the non-target species are elevated as a result of the other genes that are interrogated by biased probes having a lower rank (Fig. 3). A key difference between these two classes of divergently expressed genes is that a larger proportion of the genes that appear upregulated in XL are interrogated by probes with differential performance (bias) between species. In analyses with a gDNA probemask, therefore, we predicted that the expression rank of genes that appear to be significantly upregulated in the non-target species would be highly correlated with the expression rank of these genes in the target species. We expected this correlation to be much higher than the correlation between the ranks of genes upregulated in the target species and the rank of these same genes in the non-target species. To test this, we calculated the Spearman’s rank correlation (SRC) of the rank in each treatment of (i) genes upregulated in the non-target species and (ii) genes upregulated in the target species. Under our hypothesis that many of the genes that are upregulated in the non-target species are false positives, we expected that the SRC would be much higher in (i) than in (ii). To quantify this expectation, we calculated the absolute value of the difference in the SRC in (i) and (ii) for the interspecies comparisons, and we refer to this difference as d SRC. For comparative purposes, d SRC was calculated for interspecific comparisons between XL and a non-target species, comparisons between each species and a hybrid, and intraspecific comparisons between different tissues of XL, and this was performed for analyses with each type of probemask. The data support our expectation. When the XB/XL gDNA probemask or the XM/XL gDNA probemask are used in interspecific comparisons, the d SRC of the rank of genes upregulated in the non-target species is substantially higher than that of genes upregulated in the target species or in hybrids (Table 2). When comparisons were made between tissue types in XL or within a tissue type of XL and a hybrid using these gDNA probemasks, extreme differences between d SRC of each of these classes of genes were not observed (Table 2). A high d SRC was not observed in any of the analyses with the XB + XL perfect match probemask (Table 3). Furthermore, we found other signs of technical bias in results generated with gDNA probemasks, but not the XB + XL perfect match probemask, by comparing the mean rank of significantly upregulated genes (Supporting Information S1, Table S1, Table S2). Taken together, these observations are consistent with the notion that the use of probemasks based on gDNA ratios on the Affymetrix GeneChip H Xenopus laevis Genome Array produces spurious results when comparisons are made directly between species or between a non-target species and a hybrid, irrespective of tissue type. When gDNA probemasks are used, many of the genes that are putatively upregulated in the non-target species are actually false positives whose high ranks are an artifact of the low ranks of poorly performing probesets. Of course, this group of genes may include some genes that are not false positives, but it is not clear which ones these are. We suspect then, albeit with caveats discussed below, that our analysis with the XB + XL perfect match probemask is a closer approximation of biological variation than that recovered by [17,18]. A challenge to the implementation of single-species microarrays in comparative transcriptomics is the identification of unbiased probes. Due to differences from the target species, such as sequence divergence, non-target transcripts will exhibit a range of probe hybridization efficiencies that cause technical variation in hybridization intensities. In comparative analyses, normalization may overcompensate for genes with lower than average divergence and undercompensate for genes with higher than average divergence [34]. Exacerbating this problem, our analysis of confirmed perfect match probes in a target and a non-target species illustrates that the gDNA ratio is an unreliable metric with which to identify unbiased probes on the Affymetrix GeneChip H Xenopus laevis Genome Array. This approach selects probes with low gDNA intensity (Fig. 1), misses probes that do perfectly match both species (Fig. 2A), and includes probes that do not perfectly match both species (Fig. 2B). The implications of this are large and affect fundamental conclusions of the analysis, such as which and how many genes are significantly or not significantly differently expressed. Notably, our analyses suggest ...

Similar publications

Article
Full-text available
The molecular mechanisms governing vertebrate appendage regeneration remain poorly understood. Uncovering these mechanisms may lead to novel therapies aimed at alleviating human disfigurement and visible loss of function following injury. Here, we explore tadpole tail regeneration in Xenopus tropicalis, a diploid frog with a sequenced genome. We fo...
Article
Full-text available
Background The senses of hearing and balance depend upon mechanoreception, a process that originates in the inner ear and shares features across species. Amphibians have been widely used for physiological studies of mechanotransduction by sensory hair cells. In contrast, much less is known of the genetic basis of auditory and vestibular function i...
Article
Full-text available
The latest generation of Affymetrix microarrays are designed to interrogate expression over the entire length of every locus, thus giving the opportunity to study alternative splicing genome-wide. The Exon 1.0 ST (sense target) platform, with versions for Human, Mouse and Rat, is designed primarily to probe every known or predicted exon. The smalle...

Citations

... Most of the annotated genes expressed in any one species were detected in all three species (Additional file 3: Figure S2B). However, to confirm species differences in expression profiles, qPCR was used to validate hybridization results [36] by using primers with approximately equal efficiency for all three species (Additional file 4: Table S2). ...
Article
Full-text available
    Salmon species vary in susceptibility to infections with the salmon louse (Lepeophtheirus salmonis). Comparing mechanisms underlying responses in susceptible and resistant species is important for estimating impacts of infections on wild salmon, selective breeding of farmed salmon, and expanding our knowledge of fish immune responses to ectoparasites. Herein we report three L. salmonis experimental infection trials of co-habited Atlantic Salmo salar, chum Oncorhynchus keta and pink salmon O. gorbuscha, profiling hematocrit, blood cortisol concentrations, and transcriptomic responses of the anterior kidney and skin to the infection. In all trials, infection densities (lice per host weight (g)) were consistently highest on chum salmon, followed by Atlantic salmon, and lowest in pink salmon. At 43 days post-exposure, all lice had developed to motile stages, and infection density was uniformly low among species. Hematocrit was reduced in infected Atlantic and chum salmon, and cortisol was elevated in infected chum salmon. Systemic transcriptomic responses were profiled in all species and large differences in response functions were identified between Atlantic and Pacific (chum and pink) salmon. Pink and chum salmon up-regulated acute phase response genes, including complement and coagulation components, and down-regulated antiviral immune genes. The pink salmon response involved the largest and most diverse iron sequestration and homeostasis mechanisms. Pattern recognition receptors were up-regulated in all species but the active components were often species-specific. C-type lectin domain family 4 member M and acidic mammalian chitinase were specifically up-regulated in the resistant pink salmon. Experimental exposures consistently indicated increased susceptibility in chum and Atlantic salmon, and resistance in pink salmon, with differences in infection density occurring within the first three days of infection. Transcriptomic analysis suggested candidate resistance functions including local inflammation with cytokines, specific innate pattern recognition receptors, and iron homeostasis. Suppressed antiviral immunity in both susceptible and resistant species indicates the importance of future work investigating co-infections of viral pathogens and lice.
    ... Early in the diversification of Xenopus tetraploids, an ancestor split into two descendant lineages that eventually evolved into Xenopus laevis and the common ancestor of Xenopus borealis and Xenopus muelleri. It is estimated that X. laevis and X. borealis, which are the focus of this study, diverged from one another million years ago [9][10][11]. The question as to how many allopolyploidization events occurred to generate the extant Xenopus species remains an open one and has implications for the level of divergence between the "subgenomes" of each tetraploid lineage [5]. ...
    ... In such interspecies crosses, the F1 males are sterile and the F1 females are fully or partially fertile [24]. The generation of F1 hybrids between X. laevis and the Marsabit clawed frog X. borealis (Parker, 1936) in laboratory has been described and microarray analyses combined with comparative transcriptomics have been used to study gene expression in the testis and brain tissue from hybrid males [10]. The aim of the present study was to use peptidomic analysis (reversed-phase HLPC coupled with mass spectrometry) to compare the AMPs, and the caerulein-related and xenopsin-related peptides in norepinephrine-stimulated skin secretions from X. laevis female × X. borealis male hybrids (hereafter X LB ) with hybrids from the reciprocal cross from X. borealis female and X. laevis male (hereafter X BL ). ...
    ... University (Protocol No. A21-09) and were carried out by authorized investigators. Both types of F1 hybrids (X LB and X BL ) between X. laevis (from the Cape region of South Africa) and X. borealis (from Kenya) were generated via in vitro fertilization as previously described [10]. Female X LB hybrids (n = 2; 5-6 years old; weights 26.1 g and 27.8 g) and female X BL hybrids (n = 2; 5-6 years old; weight 31.1 g and 31.5 g) were injected via the dorsal lymph sac with norepinephrine hydrochloride (40 nmol/g body weight) and placed in water (100 ml) for 15 min. ...
    ... Microarrays developed for a specific species have been used for transcript profiling of closely related species. Cross-species RNA hybridization (CSH) to DNA microarrays has been used successfully in both animal and plant when a representative microarray platform is not available1516171819202122232425262728293031323334. The assumption that underlies the validity of CSH on a gene chip of a closely related species is that the level of sequence homology among genes conserved between closely related species is significant enough to enable the detection of messages by probes originally designed for their orthologs. ...
    ... Several CSH studies have utilized this approach [20,29,32]. However, a recent study questioned the reliability of the DNA hybridization-based method for selecting unbiased probes in CSH studies [34]. Wang et al. [27] took a different approach to identify inter-species conserved (ISC) probe sets based on the expressed sequence tag (EST) homology between target and non-target species. ...
    Data
    Full-text available
      ... However, this method was limited by an inability to distinguish between technical variation due to faulty hybridization and gene expression divergence across species. Consequently, single-species microarrays have been called into question in comparative transcriptomics (Chain et al., 2008). In contrast to the previous studies, this study compared two allotetraploid species (X. ...
      Article
        The availability of both the Xenopus tropicalis genome and the soon to be released Xenopus laevis genome provides a solid foundation for Xenopus developmental biologists. The Xenopus community has presently amassed expression data for ∼2,300 genes in the form of published images collected in the Xenbase, the principal Xenopus research database. A few of these genes have been examined in both X. tropicalis and X. laevis and the cross-species comparison has been proven invaluable for studying gene function. A recently published work has yielded developmental expression profiles for the majority of Xenopus genes across fourteen developmental stages spanning the blastula, gastrula, neurula, and the tail-bud. While this data was originally queried for global evolutionary and developmental principles, here we demonstrate its general use for gene-level analyses. In particular, we present the accessibility of this dataset through Xenbase and describe biases in the characterized genes in terms of sequence and expression conservation across the two species. We further indicate the advantage of examining coexpression for gene function discovery relating to developmental processes conserved across species. We suggest that the integration of additional large-scale datasets--comprising diverse functional data--into Xenbase promises to provide a strong foundation for researchers in elucidating biological processes including the gene regulatory programs encoding development.
        ... Several studies have reported that genes with specific attributes change expression more quickly4567, though it is not known whether the expression of such subsets of genes also diverges linearly with time. RNA-Seq offers a methodological improvement over microarrays for measuring expression divergence because it does not suffer from the probe-based biases that confound cross-species microarray measurements891011 . Technical replication studies, in which expression values are assayed more than once from the same sample, have shown that RNA-Seq quantifies relative gene expression accurately [12,13]. ...
        Article
        Full-text available
          The evolution of gene expression is a challenging problem in evolutionary biology, for which accurate, well-calibrated measurements and methods are crucial. We quantified gene expression with whole-transcriptome sequencing in four diploid, prototrophic strains of Saccharomyces species grown under the same condition to investigate the evolution of gene expression. We found that variation in expression is gene-dependent with large variations in each gene's expression between replicates of the same species. This confounds the identification of genes differentially expressed across species. To address this, we developed a statistical approach to establish significance bounds for inter-species differential expression in RNA-Seq data based on the variance measured across biological replicates. This metric estimates the combined effects of technical and environmental variance, as well as Poisson sampling noise by isolating each component. Despite a paucity of large expression changes, we found a strong correlation between the variance of gene expression change and species divergence (R² = 0.90). We provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects.GEO Accession Number: GSE32679.
          ... The former method, which is used primarily when commercial microarrays are not available for both species, depends on the assumption that transcripts from closely related species (e.g., human and rhesus macaque) hybridize equally well to a homologous probe. In practice, this assumption is not always correct, which clearly confounds the experimental results (Gilad et al., 2005;Sartor et al., 2006;Chain et al., 2008). Furthermore, because many interspecies comparisons involve evolutionarily distant organisms with substantial sequence divergence, this approach has somewhat limited usefulness. ...
          Chapter
            Introduction Use for Chemical Risk Assessment Issues to Consider Transcriptome Level Comparisons Tools and Approaches for Data Analysis Conclusions References
            ... This approach contrasts to what has been reported before for S. pennellii, where total trichomes, including stalks, were aggregately (trichome types were not distinguishable) analyzed for gene expression level by hybridization to the S. lycopersicum TOM2 microarray, an array containing sequences from a different Solanum species than what was used in the hybridization analysis (Slocombe et al., 2008). Such cross-species microarray experiments, while capable of differentiating rough relative expression levels between tissues within the same species for many (but not all) genes in the target species, are not capable of providing absolute expression levels or of being useful when comparing expression for any particular gene between species ( Chain et al., 2008;Gilad et al., 2009). Thus, it is unfortunately practically impossible to draw comparisons between our approach and what has been previously published with regard to observed levels of expression for any particular gene, as our approach was fundamentally different from what has been described before. ...
            Article
            Full-text available
              Glandular trichomes play important roles in protecting plants from biotic attack by producing defensive compounds. We investigated the metabolic profiles and transcriptomes to characterize the differences between different glandular trichome types in several domesticated and wild Solanum species: Solanum lycopersicum (glandular trichome types 1, 6, and 7), Solanum habrochaites (types 1, 4, and 6), Solanum pennellii (types 4 and 6), Solanum arcanum (type 6), and Solanum pimpinellifolium (type 6). Substantial chemical differences in and between Solanum species and glandular trichome types are likely determined by the regulation of metabolism at several levels. Comparison of S. habrochaites type 1 and 4 glandular trichomes revealed few differences in chemical content or transcript abundance, leading to the conclusion that these two glandular trichome types are the same and differ perhaps only in stalk length. The observation that all of the other species examined here contain either type 1 or 4 trichomes (not both) supports the conclusion that these two trichome types are the same. Most differences in metabolites between type 1 and 4 glands on the one hand and type 6 glands on the other hand are quantitative but not qualitative. Several glandular trichome types express genes associated with photosynthesis and carbon fixation, indicating that some carbon destined for specialized metabolism is likely fixed within the trichome secretory cells. Finally, Solanum type 7 glandular trichomes do not appear to be involved in the biosynthesis and storage of specialized metabolites and thus likely serve another unknown function, perhaps as the site of the synthesis of protease inhibitors.
              ... Similarly, comparative transcriptomics are experiencing a growing interest for a wide range of studies including abiotic stress response (Campoli et al. 2009), regulation of developmental processes (Gil-Humanes et al. 2009) or differential regulation between polyploid species (He et al. 2003; Poole et al. 2007; Salentijn et al. 2009). In this context, microarrays designed for one species are being used to determine expression divergence among species (Becher et al. 2004; Weber et al. 2004; Chain et al. 2008). However, sequence mismatches could cause bias when expression divergence of two species are compared using an array designed for only one of them (Chain et al. 2008). ...
              ... In this context, microarrays designed for one species are being used to determine expression divergence among species (Becher et al. 2004; Weber et al. 2004; Chain et al. 2008). However, sequence mismatches could cause bias when expression divergence of two species are compared using an array designed for only one of them (Chain et al. 2008). As a consequence these studies should validate their results using microarray-independent approaches such as quantitative real-time PCR (qPCR). ...
              Article
              Full-text available
                Comparative transcriptomics are useful to determine the role of orthologous genes among Triticeae species. Thus they constitute an interesting tool to improve the use of wild relatives for crop breeding. Reverse transcription quantitative real-time PCR (qPCR) is the most accurate measure of gene expression but efficient normalization is required. The choice and optimal number of reference genes must be experimentally determined and the primers optimized for cross-species amplification. Our goal was to test the utility of wheat-reference genes for qPCR normalization when species carrying the following genomes (A, B, D, R, H v and H ch ) are compared either simultaneously or in smaller subsets of samples. Wheat/barley/rye consensus primers outperformed wheat-specific ones which indicate that consensus primers should be considered for data normalization in comparative transcriptomics. All genes tested were stable but their ranking in terms of stability differed among subsets of samples. CDC (cell division control protein, AAA-superfamily of ATPases, Ta54227) and RLI (68 kDa protein HP68 similar to Arabidopsis thaliana RNase L inhibitor protein, Ta2776) were always among the three most stable genes. The optimal number of reference genes varied between 2 and 3 depending on the subset of samples and the method used (geNorm vs. coefficient of determination between sequential normalization factors). In any case a maximum number of three reference genes would provide adequate normalization independent of the subset of samples considered. This work constitutes a substantial advance towards comparative transcriptomics using qPCR since it provides useful primers/reference genes.
                ... In addition to their utility in measuring gene expression levels, whole-genome DNA hybridization to expression arrays has various applications in small genome species; mutant mapping in yeast (genome size: ~12 Mb) and Arabidopsis (genome size: ~125 Mb) by bulk segregant analysis234567; quantitative trait loci extreme array mapping in Arabidopsis [4,8,9]; and comparative genomics in yeast [10], Arabidopsis [11], malaria mosquitoes (genome size: ~278 Mb) [12], in the human malarial parasite (genome size: ~23 Mb) [13], and in Mycobacterium tuberculosis (genome size: ~4.4 Mb) [14]. However, in large genome species, such as barley (genome size: ~5.2 Gb) [15], Xenopus (genome size: ~3.1 Gb) [16], and maize (genome size: ~2.5 Gb) [17], whole-genome DNA hybridization to expression arrays has not worked out well because of cross-hybridization . Although applications of oligonucleotide expression arrays were limited in large genome species, complementary RNA (cRNA) from their transcripts was used to detect SFPs in barley [15,18,19], maize [17], wheat (genome size: ~17 Gb) [20], and cowpea (genome size: ~600 Mb) [21]. ...
                Article
                Full-text available
                  High-density oligonucleotide arrays are effective tools for genotyping numerous loci simultaneously. In small genome species (genome size: < approximately 300 Mb), whole-genome DNA hybridization to expression arrays has been used for various applications. In large genome species, transcript hybridization to expression arrays has been used for genotyping. Although rice is a fully sequenced model plant of medium genome size (approximately 400 Mb), there are a few examples of the use of rice oligonucleotide array as a genotyping tool. We compared the single feature polymorphism (SFP) detection performance of whole-genome and transcript hybridizations using the Affymetrix GeneChip Rice Genome Array, using the rice cultivars with full genome sequence, japonica cultivar Nipponbare and indica cultivar 93-11. Both genomes were surveyed for all probe target sequences. Only completely matched 25-mer single copy probes of the Nipponbare genome were extracted, and SFPs between them and 93-11 sequences were predicted. We investigated optimum conditions for SFP detection in both whole genome and transcript hybridization using differences between perfect match and mismatch probe intensities of non-polymorphic targets, assuming that these differences are representative of those between mismatch and perfect targets. Several statistical methods of SFP detection by whole-genome hybridization were compared under the optimized conditions. Causes of false positives and negatives in SFP detection in both types of hybridization were investigated. The optimizations allowed a more than 20% increase in true SFP detection in whole-genome hybridization and a large improvement of SFP detection performance in transcript hybridization. Significance analysis of the microarray for log-transformed raw intensities of PM probes gave the best performance in whole genome hybridization, and 22,936 true SFPs were detected with 23.58% false positives by whole genome hybridization. For transcript hybridization, stable SFP detection was achieved for highly expressed genes, and about 3,500 SFPs were detected at a high sensitivity (> 50%) in both shoot and young panicle transcripts. High SFP detection performances of both genome and transcript hybridizations indicated that microarrays of a complex genome (e.g., of Oryza sativa) can be effectively utilized for whole genome genotyping to conduct mutant mapping and analysis of quantitative traits such as gene expression levels.
                  ... Microarrays developed for a specific species have been used for transcript profiling of closely related species. Cross-species RNA hybridization (CSH) to DNA microarrays has been used successfully in both animal and plant when a representative microarray platform is not available1516171819202122232425262728293031323334. The assumption that underlies the validity of CSH on a gene chip of a closely related species is that the level of sequence homology among genes conserved between closely related species is significant enough to enable the detection of messages by probes originally designed for their orthologs. ...
                  ... Several CSH studies have utilized this approach [20,29,32]. However, a recent study questioned the reliability of the DNA hybridization-based method for selecting unbiased probes in CSH studies [34]. Wang et al. [27] took a different approach to identify inter-species conserved (ISC) probe sets based on the expressed sequence tag (EST) homology between target and non-target species. ...
                  Article
                  Full-text available
                    Common bean (Phaseolus vulgaris L.) and soybean (Glycine max) both belong to the Phaseoleae tribe and share significant coding sequence homology. This suggests that the GeneChip(R) Soybean Genome Array (soybean GeneChip) may be used for gene expression studies using common bean. To evaluate the utility of the soybean GeneChip for transcript profiling of common bean, we hybridized cRNAs purified from nodule, leaf, and root of common bean and soybean in triplicate to the soybean GeneChip. Initial data analysis showed a decreased sensitivity and accuracy of measuring differential gene expression in common bean cross-species hybridization (CSH) GeneChip data compared to that of soybean. We employed a method that masked putative probes targeting inter-species variable (ISV) regions between common bean and soybean. A masking signal intensity threshold was selected that optimized both sensitivity and accuracy of measuring differential gene expression. After masking for ISV regions, the number of differentially-expressed genes identified in common bean was increased by 2.8-fold reflecting increased sensitivity. Quantitative RT-PCR (qRT-PCR) analysis of 20 randomly selected genes and purine-ureide pathway genes demonstrated an increased accuracy of measuring differential gene expression after masking for ISV regions. We also evaluated masked probe frequency per probe set to gain insight into the sequence divergence pattern between common bean and soybean. The sequence divergence pattern analysis suggested that the genes for basic cellular functions and metabolism were highly conserved between soybean and common bean. Additionally, our results show that some classes of genes, particularly those associated with environmental adaptation, are highly divergent. The soybean GeneChip is a suitable cross-species platform for transcript profiling in common bean when used in combination with the masking protocol described. In addition to transcript profiling, CSH of the GeneChip in combination with masking probes in the ISV regions can be used for comparative ecological and/or evolutionary genomics studies.