Skip to content

[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive

9 messages · Leonardo Collado Torres, Michael Lawrence

#
Hi Michael,

I have a use case that is similar to
https://support.bioconductor.org/p/81267/#82142 and looks to me like
it might need some changes in rtracklayer to work. That's why I'm
posting it here this time.

Basically, I'm trying to use rtracklayer to import a bigwig file over
the web which is in a different type of url than before. Using
utils::download.file() with the defaults doesn't work, I have to use
method = 'curl' and extra = '-L'.

More specifically, the original url
http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
effective url https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5

Now, using the second url with utils::download.file() and default
methods also doesn't work. It does on the browser though.


As you can see, downloading the file doesn't work out of the box.
Which I guess that it's not surprising that using rtracklayer I get
errors like:

In seqinfo(ranges) :
  No openssl available in netConnectHttps for
content-na.drive.amazonaws.com : 443

You can find further details (code and log file) at
https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453

Thanks,
Leo
#
The URL redirection is something I can try to add. For the other error, you
need to get openssl installed and made visible to pkg-config, so that
rtracklayer finds it during its build process.

Michael

On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres <lcollado at jhu.edu>
wrote:

  
  
#
Hi Michael,

I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
the OS X binary would work out of the box.

Anyhow, I installed rtracklayer (release) manually and got another
error (slightly different message now).




$ svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
$ R CMD INSTALL rtracklayer
Loading required package: colorout
* installing to library
?/Library/Frameworks/R.framework/Versions/3.3release/Resources/library?
* installing *source* package ?rtracklayer? ...
checking for pkg-config... /usr/local/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for OPENSSL... yes
## more output

$ R
+     require('RCurl')
+     opts <- list(
+         followlocation = TRUE,  # resolve redirects
+         ssl.verifyhost = FALSE, # suppress certain SSL errors
+         ssl.verifypeer = FALSE,
+         nobody = TRUE, # perform HEAD request
+         verbose = FALSE
+     )
+     curlhandle <- getCurlHandle(.opts = opts)
+     getURL(uri, curl = curlhandle)
+     info <- getCurlInfo(curlhandle)
+     rm(curlhandle)  # release the curlhandle!
+     info$effective.url
+ }
Loading required package: RCurl
Loading required package: bitops
[1] "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning message:
In seqinfo(ranges) :
  Couldn't open
https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(ranges) :
  TCP non-blocking connect() to content-na.drive.amazonaws.com
timed-out in select() after 10000 milliseconds - Cancelling!
2: In seqinfo(ranges) :
  Couldn't open
http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
2016-05-05 17:38:30
Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 RC (2016-05-01 r70572)
 system   x86_64, darwin13.4.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 tz       America/New_York
 date     2016-05-05

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source
 Biobase                2.32.0   2016-05-04 Bioconductor
 BiocGenerics         * 0.18.0   2016-05-04 Bioconductor
 BiocParallel           1.6.0    2016-05-04 Bioconductor
 Biostrings             2.40.0   2016-05-04 Bioconductor
 bitops               * 1.0-6    2013-08-17 CRAN (R 3.3.0)
 colorout             * 1.1-2    2016-05-05 Github (jalvesaq/colorout at 6538970)
 devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
 digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
 GenomeInfoDb         * 1.8.0    2016-05-04 Bioconductor
 GenomicAlignments      1.8.0    2016-05-04 Bioconductor
 GenomicRanges        * 1.24.0   2016-05-04 Bioconductor
 IRanges              * 2.6.0    2016-05-04 Bioconductor
 memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
 RCurl                * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 Rsamtools              1.24.0   2016-05-04 Bioconductor
 rtracklayer          * 1.32.0   2016-05-05 Bioconductor
 S4Vectors            * 0.10.0   2016-05-04 Bioconductor
 SummarizedExperiment   1.2.0    2016-05-04 Bioconductor
 withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
 XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 XVector                0.12.0   2016-05-04 Bioconductor
 zlibbioc               1.18.0   2016-05-04 Bioconductor
On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
#
I checked in something that tries to find openssl automatically on the Mac.

It looks like AWS is for some reason returning 404 for the HEAD command
that the UCSC library uses the get info about the file like the content
size. Same thing happens when I play around in Firefox's developer tools.
The error response header claims a JSON content type, but no JSON is
actually sent, so there is no further description of the error. I think
this is a bug in Amazon.

Seems like for now you'll need to download the file first.

Michael

On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
wrote:

  
  
25 days later
#
Hi Michael,

We tried getting things to work with Amazon Cloud Drive (see Abhi's
efforts at https://github.com/nellore/duffel/commits/master). But we
now have the data hosted elsewhere where the links work properly.

I just noted a small mistake on rtracklayer:::expandPath(). See:
[1] FALSE
[1] TRUE


The fix is simple. At
https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
change it to:

expandPath <- function(x) {
if (startsWith(x, "http") | startsWith(x, "ftp"))
expandURL(x)
else path.expand(x)
}

Best,
Leo

On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
#
Thanks for pointing out that buglet. Fixed.

On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
<lcollado at jhu.edu> wrote:
#
Hi Michael,

Thanks!

Actually, it looks like there are a few more quick changes I need you
to do. Simply at
https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R
replace path.expand() with expandPath(). I'm not sure this applies to
all current path.expand() calls, but at least it does for
https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20

Best,
Leo
2016-05-31 14:11:52 loadCoverage: loading BigWig file
http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
  Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
Timing stopped at: 0.068 0.009 0.817
14: .Call(BWGFile_seqlengths, path.expand(path(x)))
13: seqinfo(con)
12: seqinfo(con)
11: .local(con, format, text, ...)
10: import(file, selection = range, as = "RleList")
9: import(file, selection = range, as = "RleList")
8: FUN(X[[i]], ...)
7: lapply(as.list(X), FUN = FUN, ...)
6: lapply(as.list(X), FUN = FUN, ...)
5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
       verbose = verbose)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
       verbose = verbose)
3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
1: system.time(regions <- expressed_regions("SRP009615", "chrY",
       cutoff = 5L))
Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 RC (2016-05-01 r70572)
 system   x86_64, darwin13.4.0
 ui       AQUA
 language (EN)
 collate  en_US.UTF-8
 tz       America/New_York
 date     2016-05-31

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source
 acepack                1.3-3.3  2014-11-24 CRAN (R 3.3.0)
 AnnotationDbi          1.35.3   2016-05-27 Bioconductor
 Biobase                2.33.0   2016-05-05 Bioconductor
 BiocGenerics         * 0.19.0   2016-05-05 Bioconductor
 BiocParallel           1.7.2    2016-05-20 Bioconductor
 biomaRt                2.29.2   2016-05-30 Bioconductor
 Biostrings             2.41.1   2016-05-27 Bioconductor
 bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
 BSgenome               1.41.0   2016-05-05 Bioconductor
 bumphunter             1.13.0   2016-05-05 Bioconductor
 chron                  2.3-47   2015-06-24 CRAN (R 3.3.0)
 cluster                2.0.4    2016-04-18 CRAN (R 3.3.0)
 codetools              0.2-14   2015-07-15 CRAN (R 3.3.0)
 colorspace             1.2-6    2015-03-11 CRAN (R 3.3.0)
 data.table             1.9.6    2015-09-19 CRAN (R 3.3.0)
 DBI                    0.4-1    2016-05-08 CRAN (R 3.3.0)
 derfinder            * 1.7.5    2016-05-20 Bioconductor
 derfinderHelper        1.7.3    2016-05-20 Bioconductor
 devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
 digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
 doRNG                  1.6      2014-03-07 CRAN (R 3.3.0)
 foreach                1.4.3    2015-10-13 CRAN (R 3.3.0)
 foreign                0.8-66   2015-08-19 CRAN (R 3.3.0)
 Formula                1.2-1    2015-04-07 CRAN (R 3.3.0)
 GenomeInfoDb         * 1.9.1    2016-05-13 Bioconductor
 GenomicAlignments      1.9.0    2016-05-05 Bioconductor
 GenomicFeatures        1.25.12  2016-05-21 Bioconductor
 GenomicFiles           1.9.7    2016-05-27 Bioconductor
 GenomicRanges        * 1.25.0   2016-05-05 Bioconductor
 ggplot2                2.1.0    2016-03-01 CRAN (R 3.3.0)
 gridExtra              2.2.1    2016-02-29 CRAN (R 3.3.0)
 gtable                 0.2.0    2016-02-26 CRAN (R 3.3.0)
 Hmisc                  3.17-4   2016-05-02 CRAN (R 3.3.0)
 IRanges              * 2.7.1    2016-05-27 Bioconductor
 iterators              1.0.8    2015-10-13 CRAN (R 3.3.0)
 lattice                0.20-33  2015-07-14 CRAN (R 3.3.0)
 latticeExtra           0.6-28   2016-02-09 CRAN (R 3.3.0)
 locfit                 1.5-9.1  2013-04-20 CRAN (R 3.3.0)
 magrittr               1.5      2014-11-22 CRAN (R 3.3.0)
 Matrix                 1.2-6    2016-05-02 CRAN (R 3.3.0)
 matrixStats            0.50.2   2016-04-24 CRAN (R 3.3.0)
 memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
 munsell                0.4.3    2016-02-13 CRAN (R 3.3.0)
 nnet                   7.3-12   2016-02-02 CRAN (R 3.3.0)
 pkgmaker               0.22     2014-05-14 CRAN (R 3.3.0)
 plyr                   1.8.3    2015-06-12 CRAN (R 3.3.0)
 qvalue                 2.5.2    2016-05-20 Bioconductor
 RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.3.0)
 Rcpp                   0.12.5   2016-05-14 CRAN (R 3.3.0)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 recount              * 0.99.0   2016-05-31 Bioconductor
 registry               0.3      2015-07-08 CRAN (R 3.3.0)
 reshape2               1.4.1    2014-12-06 CRAN (R 3.3.0)
 rngtools               1.2.4    2014-03-06 CRAN (R 3.3.0)
 rpart                  4.1-10   2015-06-29 CRAN (R 3.3.0)
 Rsamtools              1.25.0   2016-05-05 Bioconductor
 RSQLite                1.0.0    2014-10-25 CRAN (R 3.3.0)
 rtracklayer            1.33.2   2016-05-31 Github
(Bioconductor-mirror/rtracklayer at 917973e)
 S4Vectors            * 0.11.2   2016-05-27 Bioconductor
 scales                 0.4.0    2016-02-26 CRAN (R 3.3.0)
 stringi                1.0-1    2015-10-22 CRAN (R 3.3.0)
 stringr                1.0.0    2015-04-30 CRAN (R 3.3.0)
 SummarizedExperiment   1.3.2    2016-05-20 Bioconductor
 survival               2.39-4   2016-05-11 CRAN (R 3.3.0)
 VariantAnnotation      1.19.1   2016-05-20 Bioconductor
 withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
 XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 xtable                 1.8-2    2016-02-05 CRAN (R 3.3.0)
 XVector                0.13.0   2016-05-05 Bioconductor
 zlibbioc               1.19.0   2016-05-05 Bioconductor




On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
#
Sure, done.

On Tue, May 31, 2016 at 11:18 AM, Leonardo Collado Torres
<lcollado at jhu.edu> wrote:
#
Awesome, thanks!

On Tue, May 31, 2016 at 4:11 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote: