[Bioc-devel] affyQA/QC

3 messages · Gordon K Smyth, Robert Gentleman, Caroline Johnston

Sat, Sep 23, 2006 4:02 AM #

I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
implementing something close to this in the affylmGUI package for the BioC 1.9 release.

Best wishes
Gordon

-------------------
Set of affy QA plots and summaries:

Boxplots of chip-wise intensities:
\begin{Sinput}

\end{Sinput}

Empirical distributions of chip-wise intensities:
\begin{Sinput}

\end{Sinput}

RNA digestion plot:
\begin{Sinput}

\end{Sinput}

Affy QC parameters:
The bioB spike-ins should be present.
All the other measures should be consistent across chips.
\begin{Sinput}

+      Percent.present=qc at percent.present,
+      Scale.factor=qc at scale.factors,
+      Average.background=qc at average.background,
+      bioBCalls=qc at bioBCalls=="P",
+      t(qc at spikes),
+      t(qc at qc.probes))

\end{Sinput}

Image plots of probe level robust residuals.
Larger residuals are darker and indicate deviations from the additive model used to summarise
probes within each probe-set.
\begin{Sinput}

\end{Sinput}

Normalized Unscaled Standard Errors (NUSE) plot.
The standard error estimates obtained for each gene on each array from fitPLM
are standardized across arrays so that the median standard error for that
genes is 1 across all arrays.
An array with elevated SEs relative to other arrays is typically of
lower quality.
\begin{Sinput}

\end{Sinput}

Relative Log Expression (RLE) values.
RLE values are computed for each probeset by comparing the expression value
on each array against the median expression value for that probeset across all arrays.
Assuming that most genes are not changing in expression across arrays means ideally
most of these RLE values will be near 0.
When examining this plot focus should be
on the shape and position of each of the boxes.
Typically arrays with poorer quality
show up with boxes that are not centered about 0 and/or are more spread out.
\begin{Sinput}

\end{Sinput}

3 days later

Robert Gentleman

Tue, Sep 26, 2006 10:34 AM #

Hi Gordon,
   Not much to say - a few notes, but this seems very similar to what I 
have proposed - so I like it.  The differences don't seem to be that 
substantive.

Gordon K Smyth wrote:

Date: Fri, 22 Sep 2006 15:36:10 -0700
From: Robert Gentleman <rgentlem at fhcrc.org>
Subject: [Bioc-devel] affyQA/QC
To: bioc-devel at stat.math.ethz.ch

Hi,
   I am trying to put together a set of, what one might regard, as
standard plots and summary statistics that should be collected on any
set of Affymetrix microarrays (at least ones for gene expression). The
first pass is attached, I would appreciate any comments on it,
especially with regard to things that I have missed, or things I have
suggested that don't seem to be quite correct, or could be improved.

  On an implementation note - I will be making use of existing software
and intend to work with Craig Parman to put this into the existing
affyQCReport package - users of that might want to let me know what
functionality they are relying on, but this should be strict additions.

  thanks
    Robert

--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org


I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
implementing something close to this in the affylmGUI package for the BioC 1.9 release.

Best wishes
Gordon

-------------------
Set of affy QA plots and summaries:

Boxplots of chip-wise intensities:
\begin{Sinput}

library(gcrma)
x <- ReadAffy(filenames=targets$FileName,celfile.path="cel")
narrays <- ncol(exprs(x))
boxplot(x,names=targets$Target,las=2)

\end{Sinput}

Empirical distributions of chip-wise intensities:
\begin{Sinput}

hist(x)

\end{Sinput}

RNA digestion plot:
\begin{Sinput}

deg <- AffyRNAdeg(x)
plotAffyRNAdeg(deg,col=1:narrays)
legend("topleft",legend=1:narrays,col=1:narrrays,lty=1)

\end{Sinput}

Affy QC parameters:
The bioB spike-ins should be present.
All the other measures should be consistent across chips.
\begin{Sinput}

library(simpleaffy)
qc <- qc.affy(x)
qc.tab <- rbind(

+      Percent.present=qc at percent.present,
+      Scale.factor=qc at scale.factors,
+      Average.background=qc at average.background,
+      bioBCalls=qc at bioBCalls=="P",
+      t(qc at spikes),
+      t(qc at qc.probes))

colnames(qc.tab) <- paste("Chip",1:narrays)
options(digits=2)
qc.tab

\end{Sinput}

Image plots of probe level robust residuals.
Larger residuals are darker and indicate deviations from the additive model used to summarise
probes within each probe-set.
\begin{Sinput}

library(affyPLM)
pset <- fitPLM(x)
oldpar <- par(mfrow=c(4,2),mar=c(1,1,2,1))
image(pset, type="resids") # red=positive resids, blue=negative
par(oldpar)

\end{Sinput}

I thought about this, but for a lot of arrays it seems like it would 
be better to come back and concentrate on those that were indicated for 
other reasons.

Seems very similar -

Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org

Caroline Johnston

Wed, Sep 27, 2006 2:23 AM #

Hi,

I don't know whether this would be of any use, but I've been trying to
develop a web front-end to some of the bioconductor tools, which so far
does a lot of the affy quality assessment stuff. The biologists we work
with aren't comfortable with the command line. I've tried to design it to
make it as easy as possible to add new components (I added the simple affy
qc stuff in about half an hour. It uses the perl Catalyst web framework
and templating to generate web pages and R scripts). It's only running on
the server at the moment, so it can be pretty slow, but we're planning on
sending the R jobs off to other machines soon.

If you want to have a look it's at http://bioinformatics.essex.ac.uk/ROME
(and trac/svn from http://rome.devjavu.com/ , though I wouldn't advise
trying to install it just yet). Upload of data is disabled during testing,
but if you register you'll get some demo data to play with. There's also a
lack of documentation, but if you log in as 'thing' with password 'thing'
and go to session->datafiles you can have a look at some datafiles and
image files I've already created.

If anyone fancies helping out with development, give me a shout.

Cheers,

Cass.

On Tue, 26 Sep 2006, Robert Gentleman wrote:

Hi Gordon,
   Not much to say - a few notes, but this seems very similar to what I
have proposed - so I like it.  The differences don't seem to be that
substantive.


Gordon K Smyth wrote:

Date: Fri, 22 Sep 2006 15:36:10 -0700
From: Robert Gentleman <rgentlem at fhcrc.org>
Subject: [Bioc-devel] affyQA/QC
To: bioc-devel at stat.math.ethz.ch

Hi,
   I am trying to put together a set of, what one might regard, as
standard plots and summary statistics that should be collected on any
set of Affymetrix microarrays (at least ones for gene expression). The
first pass is attached, I would appreciate any comments on it,
especially with regard to things that I have missed, or things I have
suggested that don't seem to be quite correct, or could be improved.

  On an implementation note - I will be making use of existing software
and intend to work with Craig Parman to put this into the existing
affyQCReport package - users of that might want to let me know what
functionality they are relying on, but this should be strict additions.

  thanks
    Robert

--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org


I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
implementing something close to this in the affylmGUI package for the BioC 1.9 release.

Best wishes
Gordon

-------------------
Set of affy QA plots and summaries:

Boxplots of chip-wise intensities:
\begin{Sinput}

library(gcrma)
x <- ReadAffy(filenames=targets$FileName,celfile.path="cel")
narrays <- ncol(exprs(x))
boxplot(x,names=targets$Target,las=2)

\end{Sinput}

Empirical distributions of chip-wise intensities:
\begin{Sinput}

hist(x)

\end{Sinput}

RNA digestion plot:
\begin{Sinput}

deg <- AffyRNAdeg(x)
plotAffyRNAdeg(deg,col=1:narrays)
legend("topleft",legend=1:narrays,col=1:narrrays,lty=1)

\end{Sinput}

Affy QC parameters:
The bioB spike-ins should be present.
All the other measures should be consistent across chips.
\begin{Sinput}

library(simpleaffy)
qc <- qc.affy(x)
qc.tab <- rbind(

+      Percent.present=qc at percent.present,
+      Scale.factor=qc at scale.factors,
+      Average.background=qc at average.background,
+      bioBCalls=qc at bioBCalls=="P",
+      t(qc at spikes),
+      t(qc at qc.probes))

colnames(qc.tab) <- paste("Chip",1:narrays)
options(digits=2)
qc.tab

\end{Sinput}

Image plots of probe level robust residuals.
Larger residuals are darker and indicate deviations from the additive model used to summarise
probes within each probe-set.
\begin{Sinput}

library(affyPLM)
pset <- fitPLM(x)
oldpar <- par(mfrow=c(4,2),mar=c(1,1,2,1))
image(pset, type="resids") # red=positive resids, blue=negative
par(oldpar)

\end{Sinput}

   I thought about this, but for a lot of arrays it seems like it would
be better to come back and concentrate on those that were indicated for
other reasons.

Normalized Unscaled Standard Errors (NUSE) plot.
The standard error estimates obtained for each gene on each array from fitPLM
are standardized across arrays so that the median standard error for that
genes is 1 across all arrays.
An array with elevated SEs relative to other arrays is typically of
lower quality.
\begin{Sinput}

NUSE(pset)

\end{Sinput}

Relative Log Expression (RLE) values.
RLE values are computed for each probeset by comparing the expression value
on each array against the median expression value for that probeset across all arrays.
Assuming that most genes are not changing in expression across arrays means ideally
most of these RLE values will be near 0.
When examining this plot focus should be
on the shape and position of each of the boxes.
Typically arrays with poorer quality
show up with boxes that are not centered about 0 and/or are more spread out.
\begin{Sinput}

RLE(pset)

\end{Sinput}


   Seems very similar -