[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
The rma command works fine on my system - fresh svn checkout of R- devel from today, all relevant packages updated. I did not load genefilter as you do below, but aside from that it is the same. I have only tried once, and I ran it with R CMD BATCH. 2 comments: 1) It seems to complain about colnames and an object with no dimension. Perhaps a drop = FALSE needs to be put in place somewhere in the code (but why it works for me, I have no idea about) 2) In general, I would also have put a colnames = "pm" on the probeset matrices you make below. Hard to know whether it is a requirement in all parts of the code, but seems safer. Kasper
On Feb 3, 2009, at 11:14 , Wolfgang Huber wrote:
Thank you, Holger, Jim, Ben,
(1) this is indeed a peculiar feature of the medianpolish part of
RMA. I apologize to Ben for using the harsh word "bug"
inappropriately in this context. I was misled by observations that
using different normalisation or background correction methods
seemed to "fix" the problem, and had forgotten about those previous
threads. Sorry.
I wonder whether anyone has done research on how to improve on
medianpolish for small datasets and why people haven't adapted it.
(2) One further potential problem remains, and I hope that bringing
this up might help to make sure that rma (which is obviously a core
component of Bioconductor) is robust even under the most adverse
circumstances (e.g. myself :). With the script below, I get a core
dump. This is on a freshly checked out and built R (from today) and
all packages then installed from scratch (sessionInfo as in previous
mail below). It does not seem to happen on R-2.8.1. and Bioc release
2.3., and even on R-dev, whether it happens seems to depend on the
history of the R session. One hypothesis is that this could be
related to the "destructive=TRUE" parameter of rma?
Can anyone reproduce this?
Best wishes
Wolfgang
library("affy")
library("genefilter")
options(error=recover)
files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")
#celdir = tempdir()
#for(f in files)
# download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
# file.path(celdir, f))
celdir="."
a = ReadAffy(filenames=file.path(celdir, files))
## Define a CDF with one probeset per probe
testcdf = new.env(hash=TRUE)
for(i in seq_len(nrow(exprs(a))))
assign(paste(i), matrix(i, ncol=1, nrow=1), envir=testcdf)
a at cdfName = "testcdf"
x = rma(a)
----------------------
Background correcting
Normalizing
Calculating Expression
Error in `colnames<-`(`*tmp*`, value = c("PP_T1_1.CEL",
"PP_T1_2.CEL", :
attempt to set colnames on object with less than two dimensions
Enter a frame number, or 0 to exit
1: source("rma2.R")
2: eval.with.vis(ei, envir)
3: eval.with.vis(expr, envir, enclos)
4: rma(a)
5: `colnames<-`(`*tmp*`, value = c("PP_T1_1.CEL", "PP_T1_2.CEL",
"PP_T2_1.CEL"
Selection: 4
Called from: eval(expr, envir, enclos)
Browse[1]> ls()
[1] "*tmp*" "background" "bgversion" "cols"
"destructive"
[6] "exprs" "ngenes" "normalize" "object" "pNList"
[11] "rows" "subset" "verbose"
Browse[1]> str(exprs)
........ long dump of text ....
496: stop("attempt to set colnames on object with less than two
dimensions")
497: `colnames<-`(`*tmp*`, value = c("PP_T1_1.CEL", "PP_T1_2.CEL",
"PP_T2_1.CEL", "PP_T2_2c.CEL"))
498: rma(a)
499: eval.with.vis(expr, envir, enclos)
500: eval.with.vis(ei, envir)
501: source("rma2.R")
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
James W. MacDonald wrote:
Hi Holger, Holger Schwender wrote:
Hi Wolfgang, the difference between the two outputs x1 and x2 are due to that in x1 the probe intensities are background-corrected first and then normalized, while in x2 they are first normalized and then background-corrected. The constant values are a feature of median polish: If median polish is applied to four or six samples, then this can lead to constant values for some of the probe sets. This has been mentioned years ago by Jim (I couldn't find his original email on this topic in the mailing list archive).
Actually this only occurs with a small, odd number of samples, not even. Well, it could happen for a small number of probesets if the n/2 and n/2 + 1 ordered values were identical, but it is much more common in the case of 3 or 5 samples. Best, Jim
Best, Holger -------- Original-Nachricht --------
Datum: Tue, 03 Feb 2009 15:38:06 +0100 Von: Wolfgang Huber <huber at ebi.ac.uk> An: Bioconductor Developers <bioc-devel at stat.math.ethz.ch> Betreff: [Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
Hi,
I am getting puzzling results from the rma function in the affy
package, and furthermore they are different depending on whether
quantile normalisation is called within rma or explicitely. The
puzzling part is that some probesets, in a perfectly ordinary
experiment with 4 different arrays, are summarized with exactly
identical values across all arrays (i.e. zero variance); while the
underlying probe level data for these probesets is variable and
sits right in the middle of the dynamic range (i.e. not at any of
the extremes).
The code: ---------
library("affy") library("genefilter") library("preprocessCore")
options(error=recover)
files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL",
"PP_T2_2c.CEL")
for(f in files) download.file(file.path("http://www.ebi.ac.uk/~huber/pub
", f), file.path(tempdir(), f))
a = ReadAffy(filenames=file.path(tempdir(), files))
x1 = rma(a)
aq = a exprs(aq) = normalize.quantiles(exprs(aq)) x2 = rma(aq,
normalize=FALSE)
probeVars = function(x) { r = rowSds(exprs(x)) ps =
names(which(r==0)) cat(sprintf("%s: %d probes with zero
variance.\n", deparse(substitute(x)), length(ps))) return(ps) }
probeVars(a) ps = probeVars(x1) probeVars(x2)
print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
-----------------------------------------------------------------------------
The output: ----------- Background correcting Normalizing
Calculating Expression Background correcting Calculating Expression
a: 0 probes with zero variance. x1: 3 probes with zero variance.
x2: 0 probes with zero variance.
PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL 35244 185
236 269 253 198153 226 314
320 213 48500 375 370 408
373 109094 125 157 201 175 144686
107 131 153 100 214110 181
296 307 234 47302 355 430
397 282 108656 117 130 150
118 147528 158 144 159 135 237691
208 244 253 169 229910 94
138 147 133
sessionInfo()
R version 2.9.0 Under development (unstable) (2009-02-03 r47829) x86_64-unknown-linux-gnu locale: LC_CTYPE = it_IT .UTF -8 ;LC_NUMERIC = C ;LC_TIME = it_IT .UTF -8 ;LC_COLLATE = it_IT .UTF -8 ;LC_MONETARY = C ;LC_MESSAGES = it_IT .UTF -8 ;LC_PAPER = it_IT .UTF -8 ;LC_NAME = C ;LC_ADDRESS =C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] yeast2cdf_2.3.0 preprocessCore_1.5.3 genefilter_1.23.1 [4] affy_1.21.7 Biobase_2.3.10 fortunes_1.3-6 loaded via a namespace (and not attached): [1] affyio_1.11.3 annotate_1.21.3 AnnotationDbi_1.5.14 [4] DBI_0.2-4 RSQLite_0.7-1 splines_2.9.0 [7] survival_2.34-1 tools_2.9.0 xtable_1.5-4 Best wishes Wolfgang PS While trying to dissect this, I also experimented with constructing special CDF environments (e.g. one that had one 'probeset' for each probe), and I noted that repeatedly the C code of rma caused core dumps. Perhaps a better response to unexpected shapes of CDF environments would be to issue an error message. I can provide details if interested. ---------------------------------------------------- Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- ---------------------------------------------------- Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel