[Bioc-devel] strange bug in a BioC workflow that only appears on Jenkis
Hi Dan, wow, thanks a lot for identifying the problem! This is great and gives me a hint on what to look into. Thanks again, Bernd
On Fr, 2015-12-04 at 10:55 -0800, Dan Tenenbaum wrote:
----- Original Message -----
From: "Bernd Klaus" <bernd.klaus at embl.de> To: "bioc-devel" <bioc-devel at r-project.org> Sent: Thursday, December 3, 2015 3:41:51 AM Subject: [Bioc-devel] strange bug in a BioC workflow that only appears on Jenkis
Dear all, I am currently developing an end-to-end workflow for Microarray analysis. In this workflow I download some clinical microarray data from arrayExpress (CEL files), import it with oligo, annotate it using the appropriate ChipDB and then obtain results with limma. This gives me a data.frame "tableC" with the results from limma. The data set contains paired inflammed/non-inflamed (I/nI) mucosa samples from patients with Chron's diseaese(C) or ulcerative colitis (U). In the workflow I only analyse the differences between I/nI samples within the patients in C and obtain a limma results table called "tableC". I then want to extract the probeset IDs of the DE genes like so: DEgenesCD <- rownames(base::subset(tableC, adj.P.Val < 0.1)) Now, on my local computer(s) this gives me something like # message(paste0(as.character(DEgenesCD)[1:5], collapse = "--")) # > 7928695--8123695--8164535--8009746--7952249 However, on the CI system I get # > NA--NA--NA--NA--NA So it seems that the content of tableC "dissapears" somehow. See e.g. http://docbuilder.bioconductor.org:8080/job/maEndToEnd/58/label=win buil der1/console The minimal dummy workflow that has the bug is here in the svn https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows/maEn dToE nd/ Now strangely enough, if I run it on my local machine, save the expression data as an RData object, submit this object to svn and the load the pre-saved object in the workflow it builds successfully. So my best guess is that there is something unusual happening during the creation of the eSet from the downloaded data that then somehow affects the result table from limma. I have been trying to chase this bug for ca. three weeks, so any input would be very much appreciated ...
So the workflow builder creates a package from your Rmd file and then
tries to "R CMD build" the package.
On the workflow builder for linux, I can purl() the Rmd file to
create an R file and then source() it in R without any errors.
However, when I try and "R CMD build" the package generated from this
Rmd file, I get the same error you report.
Note that R CMD build runs a script which can be found at
$R_HOME/bin/build, and that script invokes R in a special way. If I
invoke R the same way:
R_DEFAULT_PACKAGES= LC_COLLATE=C R
...and then source the purl()'ed Rmd file, I do get the same error as
you report.
So the issue has to do with one of those two environment variables.
Starting R and only changing one of the variables:
R_DEFAULT_PACKAGES= R
works, but if I change only the other one:
LC_COLLATE=C R
it fails. So the problem has to do with the setting of LC_COLLATE.
LC_COLLATE in turn affects the sort order (see ?locales). So there is
something in your code (or code in packages that you call) that does
not work when the sort order is different.
You can debug it by starting R like this:
LC_COLLATE=C R
And then sourcing your file:
source("dummy-Workflow.R", echo=TRUE, max=Inf)
This assumes that you've first run
R CMD Stangle dummy-workflow.Rmd
to produce the dummy-workflow.R file.
So this should fail and then you will have all the tools of an
interactive R session (traceback(), sessionInfo(), debug()) available
to troubleshoot the problem.
HTH
Dan
Thanks and best wishes, Bernd
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel