Skip to content

[Bioc-devel] affyQA/QC

3 messages · Gordon K Smyth, Robert Gentleman, Caroline Johnston

#
I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
implementing something close to this in the affylmGUI package for the BioC 1.9 release.

Best wishes
Gordon

-------------------
Set of affy QA plots and summaries:

Boxplots of chip-wise intensities:
\begin{Sinput}
\end{Sinput}

Empirical distributions of chip-wise intensities:
\begin{Sinput}
\end{Sinput}

RNA digestion plot:
\begin{Sinput}
\end{Sinput}

Affy QC parameters:
The bioB spike-ins should be present.
All the other measures should be consistent across chips.
\begin{Sinput}
+      Percent.present=qc at percent.present,
+      Scale.factor=qc at scale.factors,
+      Average.background=qc at average.background,
+      bioBCalls=qc at bioBCalls=="P",
+      t(qc at spikes),
+      t(qc at qc.probes))
\end{Sinput}

Image plots of probe level robust residuals.
Larger residuals are darker and indicate deviations from the additive model used to summarise
probes within each probe-set.
\begin{Sinput}
\end{Sinput}

Normalized Unscaled Standard Errors (NUSE) plot.
The standard error estimates obtained for each gene on each array from fitPLM
are standardized across arrays so that the median standard error for that
genes is 1 across all arrays.
An array with elevated SEs relative to other arrays is typically of
lower quality.
\begin{Sinput}
\end{Sinput}

Relative Log Expression (RLE) values.
RLE values are computed for each probeset by comparing the expression value
on each array against the median expression value for that probeset across all arrays.
Assuming that most genes are not changing in expression across arrays means ideally
most of these RLE values will be near 0.
When examining this plot focus should be
on the shape and position of each of the boxes.
Typically arrays with poorer quality
show up with boxes that are not centered about 0 and/or are more spread out.
\begin{Sinput}
\end{Sinput}
3 days later
#
Hi Gordon,
   Not much to say - a few notes, but this seems very similar to what I 
have proposed - so I like it.  The differences don't seem to be that 
substantive.
Gordon K Smyth wrote:
I thought about this, but for a lot of arrays it seems like it would 
be better to come back and concentrate on those that were indicated for 
other reasons.
Seems very similar -

  
    
#
Hi,

I don't know whether this would be of any use, but I've been trying to
develop a web front-end to some of the bioconductor tools, which so far
does a lot of the affy quality assessment stuff. The biologists we work
with aren't comfortable with the command line. I've tried to design it to
make it as easy as possible to add new components (I added the simple affy
qc stuff in about half an hour. It uses the perl Catalyst web framework
and templating to generate web pages and R scripts). It's only running on
the server at the moment, so it can be pretty slow, but we're planning on
sending the R jobs off to other machines soon.

If you want to have a look it's at http://bioinformatics.essex.ac.uk/ROME
(and trac/svn from http://rome.devjavu.com/ , though I wouldn't advise
trying to install it just yet). Upload of data is disabled during testing,
but if you register you'll get some demo data to play with. There's also a
lack of documentation, but if you log in as 'thing' with password 'thing'
and go to session->datafiles you can have a look at some datafiles and
image files I've already created.

If anyone fancies helping out with development, give me a shout.

Cheers,

Cass.
On Tue, 26 Sep 2006, Robert Gentleman wrote: