To address this challenge we used a novel strategy based on genes in which we detected single feature polymorphisms SFPs. SFPs are genetic polymorphisms in observed expression within one particular feature oligonucleotide probe of a probe set 11 PM and MM probes on the array [ 21 ]. Using two barley 'Genetical Genomics' datasets we have previously shown that SFPs mainly represent expression differences that are the result of polymorphism in cis -acting regulators [ 22 ].
On this basis we propose that differential expression detected in SFP-containing genes is more likely to reflect true differential expression and so we use this as a criterion to assess the efficacy of the seven methods referred to above in the detection of differential gene expression.
The present study implements seven methods commonly used in the literature to calculate expression indices from Affymetrix microarray gene expression data, which was collected from a well-designed genome-wide microarray hybridization experiment with eight genetically divergent barley cultivars. We explore various statistical properties of the methods in modelling and analyzing the microarray dataset. The findings are compared with those based on an independent dataset of Affymetrix genome-wide gene expression profiled on two divergent yeast strains.
To explore the consistency of the 22, barley gene expression indices estimated from the seven different methods, we calculated Pearson's Product Moment Correlation coefficients in the expression estimates and the correlation analyses are summarized in Table 2.
The corresponding results based on the yeast dataset are summarized in Table 4 [see Additional file 1 ]. The same pattern of correlation in gene expression estimate between these seven methods was also recovered in the analysis of gene expression profiles on two yeast strains.
The diagonal elements in Table 2 represent means and standard deviations of correlation coefficients in gene expression indices between biological replicates. They show that MAS5.
We compared the ability of each method to calculate consistent gene expression values between biological replicates of a given barley variety using the intra-class correlation coefficients. Statistical properties of estimated barley gene expression indices from seven data extraction methods.
For each method the three columns from left to right correspond to FDR levels 0. To explain the different performances of the methods illustrated above, we investigated the effect of each step in processing the microarray datasets on estimates of the expression indices in the barley dataset. We tested use of different background correction methods but the same normalization and summarization steps in estimating the genome-wide gene expression indices, and calculated the correlation coefficient for each pair-wise comparison of background correction methods.
The correlation coefficients for the MAS5. Therefore the background correction methods did not have a significant effect on the correlation between methods. To compare the ability to detect differentially expressed genes among the barley varieties for the seven data extraction methods, our primary focus is sensitivity, defined as the total number of genes detected with significant differential expression at a given FDR level.
Figures 1b and 2b [see Additional file 2 ] show the number of genes with significant differential expression called by the seven methods across a range of FDR levels, for the barley and yeast datasets respectively. Across all FDR levels, there was marked variation among the seven methods in the number of genes detected as differentially expressed. The variation in FDR across the seven methods occurs for two reasons; firstly, variation in the number of genes detected significantly differentially expressed among the varieties and secondly, variation in the expected number of genes with detected significant differential expression when there is no real differential expression.
Shedden et al. Figures 1c and 2c [see Additional File 1 ] show how the p -value threshold required to achieve a given FDR value differs substantially among the seven methods, for both barley and yeast datasets respectively. Notably, Figures 1b and 1c and also Figures 2b and 2c [see Additional file 2 ] both illustrate exactly the same order of the seven methods, showing that calibration plays an important role in determining sensitivity in detecting differential gene expression.
An important aspect in comparing the different methods would be to compare their ability to detect the same differentially expressed genes, their mutual predictability. The MAS5. However, all pair-wise comparisons between methods showed that all methods detected differentially expressed genes not detected by the other methods. This suggests that all methods contribute unique but important information on differential gene expression.
Interestingly, methods calling similar genes as differentially expressed did not exhibit greater expression similarity. For example, the gene expression index calculated from the MAS5. On the other hand, the expression index from MAS5. The results of the yeast data analysis Table 6, [see Additional file 4 ] show exactly the same ordering of the seven methods as that obtained from the barley dataset. An important objective was to compare the ability of each method to identify genuine differential expression.
To this end, we used a recently identified set of over barley genes containing single feature polymorphisms that largely represent gene expression markers GEMs corresponding to a combination of mainly cis- acting expression regulators but also trans -acting regulators [ 22 ].
On this basis, and in the absence of an expected outcome of the differential expression analysis, we propose that differential expression detected for SFP genes is more likely to reflect true differential expression than for genes that do not contain SFP. Using this criterion we compared each of the seven methods for their ability to detect differential gene expression in the SFP genes Figure 1d using the proportion of genes declared differentially expressed that showed SFP.
It should be noted that the SFP analysis does not involve any of the methods under investigation here for quantifying gene expression. Thus, the SFP prediction provides an independent source of information for assessing performance of the methods in detecting differentially expressed genes. The development of pre-processing methods for Affymetrix oligonucleotide gene expression data has been an area of active research and has led to the availability of a large and growing toolbox of statistical methods for data extraction.
The present study examined the effect of different data extraction methods on the detection of differentially expressed genes in a barley Affymetrix oligonucleotide microarray dataset. Seven commonly used data extraction methods were used exactly as recommended by their developers, providing a directly relevant comparison of the methods as they will be used in practice by the majority of users of the software, and thus avoiding the well-known over-training problem associated with calibration datasets.
The analysis exploits an extensive genome-wide gene expression dataset from eight barley varieties showing extensive variation at phenotypic, transcriptional and genotypic levels. The presence of three replicates for each variety gave a perfectly balanced experimental design and ideal data structure for the main aims of the present research as well as a high power to detect differentially expressed genes by the analysis of variance.
It is clear from the present study that evaluation of the gene expression index is strongly affected by the data extraction method and this in turn has a strong influence on the ability to detect differential gene expression confidently. The seven commonly used methods can be divided into two groups according to the correlation structure in expression indices. Neither the use of different background correction nor normalization procedures could explain the marked variation in expression values estimated from the different methods, as shown previously [ 15 ].
Therefore the differences must be caused by the use of different statistical models to estimate the expression values. Several studies have systematically compared different data extraction methods using tightly controlled calibration datasets, but in doing so, have restricted the comparison to limited amounts of data generated using a limited number of species and platforms [ 10 , 12 , 13 ]. On the one hand, use of calibration datasets simplifies the data modelling, but on the other hand it avoids the challenges involved in modelling real data involving a larger number of sources of uncontrolled variability.
Different studies using Affymetrix spike-in experimental data have tended to produce inconsistent results [ 9 , 12 , 23 ], possibly due to hidden contaminates. Moreover, the results often conflict with those based on realistic biological datasets. The major statistical challenge in using real biological experimental datasets arises from the fact that one cannot know a priori whether or not a given gene is truly differentially expressed.
Therefore in comparing the sensitivity of each of the seven methods to detect differential gene expression, care and attention must be paid to ensure that detected differences in sensitivity among methods are not due to other factors.
The Benjamini and Hochberg [ 24 ] false discovery rate FDR was used here to control the detection of false positives in a way that was not biased in favour of any particular method.
The seven data extraction methods were explored from several angles, including sensitivity, reproducibility and mutual agreement for the identity of differentially expressed genes. Across a range of FDR levels, the PDNN method had the highest sensitivity to detect differentially expressed genes and this was directly related to the less stringent p -value threshold required by this method to declare differential expression for a given FDR level.
This explains the excellent agreement observed for the differentially expressed genes with all of the other methods. The reproducibility of results from microarray experiments is a critical issue for data analysis methods.
The seven data extraction methods showed varying sensitivities to the inherent biological variation expected within the system; the PDNN method produced the most consistent results across biological replicates, whilst MAS5. In the absence of an expected outcome, detection of differential expression within those genes with single feature polymorphism was used to further assess the ability of each method to detect genuine differential gene expression.
The set of differentially expressed genes identified by the PDNN method was significantly enriched for SFP genes compared to all other methods, reflecting the fact that the method incorporated the sequence information into its calculation of expression indices.
The PDNN method may have the highest accuracy in detecting genuine differential gene expression compared to the other six data extraction methods. Taken together, all comparisons suggest that the PDNN method is superior to its rivals for the detection of differentially expressed genes in the current dataset. In contrast, Shedden et al. To assess the performance of the PDNN method in smaller and more statistically challenging biological datasets, we conducted the same analyses using a genome-wide Affymetrix dataset of gene expression profiled on two divergent yeast strains, each with four biological replicates.
This analysis provided only a single degree of freedom for detecting differential gene expression between yeast strains, therefore we did not expect it to be as powerful as the barley data analysis.
However, the results were remarkably similar to those obtained in the barley data analysis, further supporting the superiority of the PDNN method over its rivals in detecting differentially expressed genes. However, variation due to the use of different test statistics is smaller than variation due to different processing methods [ 16 , 17 ] so we expect these differences to be robust to the use of different statistical tests.
Nevertheless, each and every method is expected to call one or more differentially expressed genes not called by the other methods.
Therefore even the less sensitive methods may contribute to our understanding of which genes are differentially expressed. A Human Mitochondrial Resequencing array is available as a catalog product. The array interrogates the entire 16 kb of the mitochondrial genome. Probes are tiled at base pair resolution, depending on the organism, as measured from the central position of adjacent mer oligos.
Array sets are available for a variety of organisms in several different configurations. The probes for interrogating each chromosome are contained on a single array from the whole genome array set and each array is available separately.
Previous generations of all Affymetrix GeneChips are available for users seeking to add new data to existing gene-expression datasets. Please contact us for pricing. The purpose of this activity is to challenge you to analyze and interpret data in a group setting and work out a real life research problem. All scenarios and results are simplified, but are related to real research and medical studies occurring in recent studies.
Scenario F: E. All Scenarios Download pdf, KB. Whether it is in medical research and drug manufacturing, insurance, reproductive technology, or public policy, the information this technology affects everyone in one way or another. It can change the way we live our lives!
An educated and informed public is a key part of making sure this genetic information is used in the most ethical manner possible and for the benefit of all. However, as with all technology, there is the possibility of abuse in a manner that complicates, discriminates or endangers the lives of others. Who has access to the information? In this activity, you will work in groups to analyze a given ethical scenario for either pros or cons or both.
You will discuss and brainstorm ideas with your group, then report out to the class. Under the teacher's direction, a discussion and debate will occur around the topics.
The object is not to "prove" anyone right or wrong, but to look at each issue from both angles and think about your own point of view about these topics. You will also act out a scenario where you are part of an advisory committee for a DNA chip company. You will have to come up with an ethical principle statement for the company. Goals of Activity Be more informed about the ethical issues surrounding the use of genetic information in society.
Look at the positives and negatives of the use of DNA chips. Begin to formulate your own opinions about the ethical dilemmas that DNA chips and genetic testing methods bring to society.
Think about how the ethical principles apply to a biotechnology company. Procedure Your teacher will organize the class into small groups. Each group will be given a different ethical scenario and asked to come up with a list of positives pros or negatives cons , or both, regarding the situation. Once you have come up with as many points as possible, discuss your list further and narrow it down to 4 to 5 key issues. Choose a speaker who will report out to the class what your situation was and the list your group came up with.
After time is up, the teacher will go around the room and allow each group to report out and lead a class discussion around the topic. Once each group is finished and the class has discussed each topic, you will break back into groups and pretend you are on the ethics committee for a company that manufactures DNA chips.
0コメント