Skip to content

[Bioc-devel] dependencies=TRUE problem, affy,gcrma,oligoClasses

8 messages · Keith Satterley, Dan Tenenbaum, Kasper Daniel Hansen +1 more

#
To the Bioconductor developer group,

I emailed the author of the affy package (Rafael Irizarry) and he 
advised me to contact the Bioconductor developers with my problem.

My problem is with the affy package. My affylmGUI package depends on the 
affy package. I only noticed this problem when I tested my program on a 
fresh install of R-2.15.2. When affylmGUI normalizes data using the rma 
function in the affy package, it calls eventually the cdfFromBioC 
function (as coded in getCDFenv.R) which uses the "install.packages" 
function with the parameter "dependencies=TRUE". This worked fine up 
until R-2.15.0, but this version of R changed the meaning of the 
dependencies  parameter to include packages also mentioned in the 
"Suggests" field.

Consequently when affy installs a cdf package like "hgu95av2cdf", the 
dependency "AnnotationDbi" is installed, which is not a problem, but 
additionally all the packages in the "Suggests" field of AnnotationDbi 
are also installed. This causes the following to be installed:
'XML', 'BSgenome', 'Rsamtools', 'bitops', 'GenomicRanges', 'Biostrings',
'rtracklayer', 'biomaRt', 'RCurl', 'GenomicFeatures', 'hgu95av2.db',
'GO.db', 'org.Sc.sgd.db', 'org.At.tair.db', 'KEGG.db', 'RUnit',
'TxDb.Hsapiens.UCSC.hg19.knownGene', 'hom.Hs.inp.db', 'org.Hs.eg.db',
'seqnames.db', 'reactome.db', 'AnnotationForge', 'DBI', 'RSQLite' and
'IRanges'.

This is a 1.8GByte download which would rather destroy a lab lesson if 
it happened during a class! Of course the immediate solution is to 
install AnnotationDbi before running affylmGUI, but that may not always 
happen.

Therefore could someone please change line 102 of getCDFenv.R to 
'dependencies=c("Depends", "Imports")' to solve this problem.

It would be very helpful if you could make the change on R-2.15.2 to 
avoid the above mentioned problems.

After using Itoshi NIKAIDO's source code search engine at 
http://search.bioconductor.jp/ (Thanks for that Itoshi, it is an 
excellent tool), I suspect that 2 other packages would cause similar 
problems. Doing a code search for "dependencies=TRUE" showed that the 
gcrma package (file getPackages.R) and the oligoClasses package (file 
utils-general.R) have this parameter on the install.packages function 
call. Perhaps it would be wise to modify these packages in a similar way.

cheers,

Keith
------------------------------
Keith Satterley
Maintainer of affylmGUI
Bioinformatics Division,
The Walter & Eliza Hall Institute
Melbourne, Australia
-----------------------------


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
#
Hi Keith,
On 1/22/2013 10:44 PM, Keith wrote:
There are two problems here. First, a normal BioC installation will 
already have AnnotationDbi installed, as this is one of only three core 
packages that are installed by

biocLite()

which is the first step in a 'regular' BioC installation procedure.

Second, if I install BioC and then strip out all of the packages you say 
will be installed, I can't reproduce what you are seeing:

 > x <- c('XML', 'BSgenome', 'Rsamtools', 'bitops', 'GenomicRanges', 
'Biostrings','rtracklayer', 'biomaRt', 'RCurl', 'GenomicFeatures', 
'hgu95av2.db','GO.db', 'org.Sc.sgd.db', 'org.At.tair.db', 'KEGG.db', 
'RUnit','TxDb.Hsapiens.UCSC.hg19.knownGene', 'hom.Hs.inp.db', 
'org.Hs.eg.db','seqnames.db', 'reactome.db', 'AnnotationForge', 'DBI', 
'RSQLite' ,'IRanges', 'AnnotationDbi')
 > sum(x %in% .packages(all.available = TRUE))
[1] 0

So I don't have any of these packages installed, including AnnotationDbi.

 > library(affy)
 > affy:::cdfFromBioC("hgu95av2cdf")
[1] "Attempting to obtain hgu95av2cdf from Bioconductor website"
[1] "Checking to see if package hgu95av2cdf is already installed"
[1] "The environment hgu95av2cdf was not found in these directories: 
/misc/staff/jmacdon/R-devel/library.  Now searching the internet 
repository."
[1] "Checking to see if your internet connection works ..."
also installing the dependencies DBI, RSQLite, IRanges, AnnotationDbi

So I end up installing the cdf, and four other packages.

I think the problem lies elsewhere.

Best,

Jim

  
    
#
On Wed, Jan 23, 2013 at 7:15 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
Starting in BioC 2.12, biocLite() just updates all installed packages.
I could reproduce Keith's problem with R 2.15.2; however, it looks
like it has been fixed in R 2.15.2 patched (I tested 2013-01-22
r61734) and R-devel (I tested 2013-01-22 r61734).

Dan
#
Hi Jim,

thanks very much for testing this. However I get a different outcome to you.

I've done this twice with similar results. I did a fresh R-2.15.2 
install, sourced biocLite, ran biocLite(), deleted AnnotationDbi from 
the library directory.

I then used biocLite to install the affy package. I loaded the affy 
package and then ran the command "affy:::cdfFromBioC("hgu95av2cdf")". 
This installed dependency "AnnotationDbi" and then reported
:
also installing the dependencies 'XML', 'BSgenome', 'Rsamtools', 
'bitops', 'GenomicRanges', 'Biostrings', 'rtracklayer', 'biomaRt', 
'RCurl', 'GenomicFeatures', 'hgu95av2.db', 'GO.db', 'org.Sc.sgd.db', 
'org.At.tair.db', 'KEGG.db', 'RUnit', 
'TxDb.Hsapiens.UCSC.hg19.knownGene', 'hom.Hs.inp.db', 'org.Hs.eg.db', 
'seqnames.db', 'reactome.db', 'AnnotationForge'

The help for install.packages command says that with the dependencies 
parameter,
"|TRUE| means (as from *R* 2.15.0) to use |c("Depends", "Imports", 
"LinkingTo", "Suggests")| for |pkgs| and |c("Depends", "Imports", 
"LinkingTo")| for added dependencies".

One would think that added dependencies (like AnnotationDbi) would be 
installed with dependencies of |c("Depends", "Imports", "LinkingTo"),
|
This was not the case. With my limited ability to debug these things, I 
think that install.packages call utils:::getDependencies with 
dependencies set to TRUE and pkgs="hgu95av2cdf", which seems to result 
in dependencies being set by the getDependencies line:
if (depends && is.logical(dependencies)) {
         dependencies <- c("Depends", "Imports", "LinkingTo",
             "Suggests")
The call also sets pkgs to "AnnotationDbi" "hgu95av2cdf", which seems 
reasonable.

Later on install.packages calls utils:::.install.winbinary with a pkgs = 
"AnnotationDbi" "hgu95av2cdf" and dependencies=TRUE. The 
.install.winbinary function then calls utils:::getDependencies with 
dependencies set to TRUE, which sets dependencies to  c("Depends", 
"Imports", "LinkingTo",  "Suggests") again.

Consequently the Suggested packages are being installed.

I'm not very sure of my ground here and would appreciate some help from 
the experts.

I have included a summarized copy of my R session, with a sessionInfo() 
at the bottom of it.

I can only think that changing the dependencies value in the cdfFromBioC 
function will fix this.

Appreciate your response,

Keith
************************
R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-w64-mingw32/x64 (64-bit)

 >source("http://bioconductor.org/biocLite.R")
...
package 'BiocInstaller' successfully unpacked and MD5 sums checked
Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help
 > biocLite()
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
Installing package(s) 'Biobase' 'IRanges' 'AnnotationDbi'
also installing the dependencies 'BiocGenerics', 'DBI', 'RSQLite'
...
package 'BiocGenerics' successfully unpacked and MD5 sums checked
package 'DBI' successfully unpacked and MD5 sums checked
package 'RSQLite' successfully unpacked and MD5 sums checked
package 'Biobase' successfully unpacked and MD5 sums checked
package 'IRanges' successfully unpacked and MD5 sums checked
package 'AnnotationDbi' successfully unpacked and MD5 sums checked
...
Old packages: 'foreign', 'lattice', 'MASS', 'Matrix', 'nlme', 'rpart', 
'survival'
Update all/some/none? [a/s/n]: n
 > x <- c('XML', 'BSgenome', 'Rsamtools', 'bitops', 'GenomicRanges', 
'Biostrings','rtracklayer', 'biomaRt', 'RCurl', 'GenomicFeatures', 
'hgu95av2.db','GO.db', 'org.Sc.sgd.db', 'org.At.tair.db', 'KEGG.db', 
'RUnit','TxDb.Hsapiens.UCSC.hg19.knownGene', 'hom.Hs.inp.db', 
'org.Hs.eg.db','seqnames.db', 'reactome.db', 'AnnotationForge', 'DBI', 
'RSQLite' ,'IRanges', 'AnnotationDbi')
 > sum(x %in% .packages(all.available = TRUE))
[1] 3
 > biocLite("affy")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
Installing package(s) 'affy'
also installing the dependencies 'affyio', 'preprocessCore', 'zlibbioc'
...
package 'affyio' successfully unpacked and MD5 sums checked
package 'preprocessCore' successfully unpacked and MD5 sums checked
package 'zlibbioc' successfully unpacked and MD5 sums checked
package 'affy' successfully unpacked and MD5 sums checked
...
 > sum(x %in% .packages(all.available = TRUE))
[1] 3
 > library(affy)
 > affy:::cdfFromBioC("hgu95av2cdf")
[1] "Attempting to obtain hgu95av2cdf from Bioconductor website"
[1] "Checking to see if package hgu95av2cdf is already installed"
[1] "The environment hgu95av2cdf was not found in these directories: 
C:/RTest2/R-2.15.2/library.  Now searching the internet repository."
[1] "Checking to see if your internet connection works ..."
also installing the dependency 'AnnotationDbi'

also installing the dependencies 'XML', 'BSgenome', 'Rsamtools', 
'bitops', 'GenomicRanges', 'Biostrings', 'rtracklayer', 'biomaRt', 
'RCurl', 'GenomicFeatures', 'hgu95av2.db', 'GO.db', 'org.Sc.sgd.db', 
'org.At.tair.db', 'KEGG.db', 'RUnit', 
'TxDb.Hsapiens.UCSC.hg19.knownGene', 'hom.Hs.inp.db', 'org.Hs.eg.db', 
'seqnames.db', 'reactome.db', 'AnnotationForge'
...
 > sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=C 
LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base
other attached packages:
[1] affy_1.36.0         Biobase_2.18.0      BiocGenerics_0.4.0 
BiocInstaller_1.8.3
loaded via a namespace (and not attached):
[1] affyio_1.26.0         preprocessCore_1.20.0 tools_2.15.2          
zlibbioc_1.4.0
 >
************************
On 24/01/2013 2:15 AM, James W. MacDonald wrote:

  
    
#
Hi Keith,
On Wed, Jan 23, 2013 at 6:28 PM, Keith <keith at wehi.edu.au> wrote:
I think this issue is fixed in R-2.15.2-patched and R-devel. Can you
try one of those?

Thanks,
Dan
#
Many people will use a standard 2.15.2 and not the patched version.

For this particular case - the automatic download of CDF packages - I
believe Keith has made a very good point, that it is not sensible to
download suggested packages.  I see no reason why the function cannot
be changed in a defensive manner to explicitly state what the depends
argument is, instead of using the default.  I am happy to take
responsibility for changing it.

On a side note, I don't understand why the CDF packages depends on
AnnotationDbi.  I don't see the data structures using anything in
AnnotationDbi and nothing is imported in the NAMESPACE and nothing
seems to used in the R code.

Kasper
On Wed, Jan 23, 2013 at 11:09 PM, Dan Tenenbaum <dtenenba at fhcrc.org> wrote:
#
Thanks Dan for the tip. Got it too late to avoid trying to debug the 
problem, but one learns from these exercises.

This patched version of R worked as you predicted.
R version 2.15.2 Patched (2013-01-17 r61672)
Platform: x86_64-w64-mingw32/x64 (64-bit)

Only hgu95av2cdf and AnnotationDbi were installed with the 
affy:::cdfFromBioC("hgu95av2cdf")
  command.

I agree with Kasper. I still think it would be a good idea to modify 
affy and as I mentioned previously, the gcrma package (file 
getPackages.R) and the oligoClasses package (file utils-general.R) have 
this parameter on the install.packages function call. Perhaps it would 
be wise to modify these packages in a similar way.

Keith.
On 24/01/2013 3:31 PM, Kasper Daniel Hansen wrote:

  
    
#
On 1/23/2013 11:31 PM, Kasper Daniel Hansen wrote:
Well, the default in all versions of R except for a few iterations of 
2.15.2 do what is expected, and the changes in 2.15.2 were quickly 
changed because it introduced this bug. In other words, the default 
behavior is to just download a small subset of the packages, not the 
gobton that Keith was seeing.

However your point about defensive programming is well taken, and I have 
made the requested changes.
This was added (incorrectly IMO) late last year. I have now removed the 
code from makecdfenv, but until we rebuild the cdf packages it will 
still lurk in the existing versions.

Best,

Jim