Hi Gordon,
Not much to say - a few notes, but this seems very similar to what I
have proposed - so I like it. The differences don't seem to be that
substantive.
Gordon K Smyth wrote:
Date: Fri, 22 Sep 2006 15:36:10 -0700
From: Robert Gentleman <rgentlem at fhcrc.org>
Subject: [Bioc-devel] affyQA/QC
To: bioc-devel at stat.math.ethz.ch
Hi,
I am trying to put together a set of, what one might regard, as
standard plots and summary statistics that should be collected on any
set of Affymetrix microarrays (at least ones for gene expression). The
first pass is attached, I would appreciate any comments on it,
especially with regard to things that I have missed, or things I have
suggested that don't seem to be quite correct, or could be improved.
On an implementation note - I will be making use of existing software
and intend to work with Craig Parman to put this into the existing
affyQCReport package - users of that might want to let me know what
functionality they are relying on, but this should be strict additions.
thanks
Robert
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
I'd also be interested in reactions on a standard set of plots and summaries. Below is the set of
tests I've been using recently (based on advice from Ken Simpson). Keith Satterley is
implementing something close to this in the affylmGUI package for the BioC 1.9 release.
Best wishes
Gordon
-------------------
Set of affy QA plots and summaries:
Boxplots of chip-wise intensities:
\begin{Sinput}
library(gcrma)
x <- ReadAffy(filenames=targets$FileName,celfile.path="cel")
narrays <- ncol(exprs(x))
boxplot(x,names=targets$Target,las=2)
\end{Sinput}
Empirical distributions of chip-wise intensities:
\begin{Sinput}
\end{Sinput}
RNA digestion plot:
\begin{Sinput}
deg <- AffyRNAdeg(x)
plotAffyRNAdeg(deg,col=1:narrays)
legend("topleft",legend=1:narrays,col=1:narrrays,lty=1)
\end{Sinput}
Affy QC parameters:
The bioB spike-ins should be present.
All the other measures should be consistent across chips.
\begin{Sinput}
library(simpleaffy)
qc <- qc.affy(x)
qc.tab <- rbind(
+ Percent.present=qc at percent.present,
+ Scale.factor=qc at scale.factors,
+ Average.background=qc at average.background,
+ bioBCalls=qc at bioBCalls=="P",
+ t(qc at spikes),
+ t(qc at qc.probes))
colnames(qc.tab) <- paste("Chip",1:narrays)
options(digits=2)
qc.tab
\end{Sinput}
Image plots of probe level robust residuals.
Larger residuals are darker and indicate deviations from the additive model used to summarise
probes within each probe-set.
\begin{Sinput}
library(affyPLM)
pset <- fitPLM(x)
oldpar <- par(mfrow=c(4,2),mar=c(1,1,2,1))
image(pset, type="resids") # red=positive resids, blue=negative
par(oldpar)
I thought about this, but for a lot of arrays it seems like it would
be better to come back and concentrate on those that were indicated for
other reasons.
Normalized Unscaled Standard Errors (NUSE) plot.
The standard error estimates obtained for each gene on each array from fitPLM
are standardized across arrays so that the median standard error for that
genes is 1 across all arrays.
An array with elevated SEs relative to other arrays is typically of
lower quality.
\begin{Sinput}
\end{Sinput}
Relative Log Expression (RLE) values.
RLE values are computed for each probeset by comparing the expression value
on each array against the median expression value for that probeset across all arrays.
Assuming that most genes are not changing in expression across arrays means ideally
most of these RLE values will be near 0.
When examining this plot focus should be
on the shape and position of each of the boxes.
Typically arrays with poorer quality
show up with boxes that are not centered about 0 and/or are more spread out.
\begin{Sinput}