[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive
Hi Michael, Thanks! Actually, it looks like there are a few more quick changes I need you to do. Simply at https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R replace path.expand() with expandPath(). I'm not sure this applies to all current path.expand() calls, but at least it does for https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20 Best, Leo
library(recount); system.time( regions <- expressed_regions('SRP009615', 'chrY', cutoff = 5L) )
2016-05-31 14:11:52 loadCoverage: loading BigWig file http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw Error in seqinfo(con) : UCSC library operation failed In addition: Warning message: In seqinfo(con) : Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw Timing stopped at: 0.068 0.009 0.817
traceback()
14: .Call(BWGFile_seqlengths, path.expand(path(x)))
13: seqinfo(con)
12: seqinfo(con)
11: .local(con, format, text, ...)
10: import(file, selection = range, as = "RleList")
9: import(file, selection = range, as = "RleList")
8: FUN(X[[i]], ...)
7: lapply(as.list(X), FUN = FUN, ...)
6: lapply(as.list(X), FUN = FUN, ...)
5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
verbose = verbose)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
verbose = verbose)
3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
1: system.time(regions <- expressed_regions("SRP009615", "chrY",
cutoff = 5L))
options(width = 120); devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------- setting value version R version 3.3.0 RC (2016-05-01 r70572) system x86_64, darwin13.4.0 ui AQUA language (EN) collate en_US.UTF-8 tz America/New_York date 2016-05-31 Packages --------------------------------------------------------------------------------------------------------------- package * version date source acepack 1.3-3.3 2014-11-24 CRAN (R 3.3.0) AnnotationDbi 1.35.3 2016-05-27 Bioconductor Biobase 2.33.0 2016-05-05 Bioconductor BiocGenerics * 0.19.0 2016-05-05 Bioconductor BiocParallel 1.7.2 2016-05-20 Bioconductor biomaRt 2.29.2 2016-05-30 Bioconductor Biostrings 2.41.1 2016-05-27 Bioconductor bitops 1.0-6 2013-08-17 CRAN (R 3.3.0) BSgenome 1.41.0 2016-05-05 Bioconductor bumphunter 1.13.0 2016-05-05 Bioconductor chron 2.3-47 2015-06-24 CRAN (R 3.3.0) cluster 2.0.4 2016-04-18 CRAN (R 3.3.0) codetools 0.2-14 2015-07-15 CRAN (R 3.3.0) colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0) data.table 1.9.6 2015-09-19 CRAN (R 3.3.0) DBI 0.4-1 2016-05-08 CRAN (R 3.3.0) derfinder * 1.7.5 2016-05-20 Bioconductor derfinderHelper 1.7.3 2016-05-20 Bioconductor devtools 1.11.1 2016-04-21 CRAN (R 3.3.0) digest 0.6.9 2016-01-08 CRAN (R 3.3.0) doRNG 1.6 2014-03-07 CRAN (R 3.3.0) foreach 1.4.3 2015-10-13 CRAN (R 3.3.0) foreign 0.8-66 2015-08-19 CRAN (R 3.3.0) Formula 1.2-1 2015-04-07 CRAN (R 3.3.0) GenomeInfoDb * 1.9.1 2016-05-13 Bioconductor GenomicAlignments 1.9.0 2016-05-05 Bioconductor GenomicFeatures 1.25.12 2016-05-21 Bioconductor GenomicFiles 1.9.7 2016-05-27 Bioconductor GenomicRanges * 1.25.0 2016-05-05 Bioconductor ggplot2 2.1.0 2016-03-01 CRAN (R 3.3.0) gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0) gtable 0.2.0 2016-02-26 CRAN (R 3.3.0) Hmisc 3.17-4 2016-05-02 CRAN (R 3.3.0) IRanges * 2.7.1 2016-05-27 Bioconductor iterators 1.0.8 2015-10-13 CRAN (R 3.3.0) lattice 0.20-33 2015-07-14 CRAN (R 3.3.0) latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0) locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.0) magrittr 1.5 2014-11-22 CRAN (R 3.3.0) Matrix 1.2-6 2016-05-02 CRAN (R 3.3.0) matrixStats 0.50.2 2016-04-24 CRAN (R 3.3.0) memoise 1.0.0 2016-01-29 CRAN (R 3.3.0) munsell 0.4.3 2016-02-13 CRAN (R 3.3.0) nnet 7.3-12 2016-02-02 CRAN (R 3.3.0) pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0) plyr 1.8.3 2015-06-12 CRAN (R 3.3.0) qvalue 2.5.2 2016-05-20 Bioconductor RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.0) Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.0) RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.0) recount * 0.99.0 2016-05-31 Bioconductor registry 0.3 2015-07-08 CRAN (R 3.3.0) reshape2 1.4.1 2014-12-06 CRAN (R 3.3.0) rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0) rpart 4.1-10 2015-06-29 CRAN (R 3.3.0) Rsamtools 1.25.0 2016-05-05 Bioconductor RSQLite 1.0.0 2014-10-25 CRAN (R 3.3.0) rtracklayer 1.33.2 2016-05-31 Github (Bioconductor-mirror/rtracklayer at 917973e) S4Vectors * 0.11.2 2016-05-27 Bioconductor scales 0.4.0 2016-02-26 CRAN (R 3.3.0) stringi 1.0-1 2015-10-22 CRAN (R 3.3.0) stringr 1.0.0 2015-04-30 CRAN (R 3.3.0) SummarizedExperiment 1.3.2 2016-05-20 Bioconductor survival 2.39-4 2016-05-11 CRAN (R 3.3.0) VariantAnnotation 1.19.1 2016-05-20 Bioconductor withr 1.0.1 2016-02-04 CRAN (R 3.3.0) XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0) xtable 1.8-2 2016-02-05 CRAN (R 3.3.0) XVector 0.13.0 2016-05-05 Bioconductor zlibbioc 1.19.0 2016-05-05 Bioconductor On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
Thanks for pointing out that buglet. Fixed. On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres <lcollado at jhu.edu> wrote:
Hi Michael, We tried getting things to work with Amazon Cloud Drive (see Abhi's efforts at https://github.com/nellore/duffel/commits/master). But we now have the data hosted elsewhere where the links work properly. I just noted a small mistake on rtracklayer:::expandPath(). See:
startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
[1] FALSE
startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
[1] TRUE The fix is simple. At https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66, change it to: expandPath <- function(x) { if (startsWith(x, "http") | startsWith(x, "ftp")) expandURL(x) else path.expand(x) } Best, Leo On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
I checked in something that tries to find openssl automatically on the Mac. It looks like AWS is for some reason returning 404 for the HEAD command that the UCSC library uses the get info about the file like the content size. Same thing happens when I play around in Firefox's developer tools. The error response header claims a JSON content type, but no JSON is actually sent, so there is no further description of the error. I think this is a bug in Amazon. Seems like for now you'll need to download the file first. Michael On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu> wrote:
Hi Michael, I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed the OS X binary would work out of the box. Anyhow, I installed rtracklayer (release) manually and got another error (slightly different message now). $ svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer $ R CMD INSTALL rtracklayer Loading required package: colorout * installing to library ?/Library/Frameworks/R.framework/Versions/3.3release/Resources/library? * installing *source* package ?rtracklayer? ... checking for pkg-config... /usr/local/bin/pkg-config checking pkg-config is at least version 0.9.0... yes checking for OPENSSL... yes ## more output $ R
library('rtracklayer')
unshorten_url <- function(uri) {
+ require('RCurl')
+ opts <- list(
+ followlocation = TRUE, # resolve redirects
+ ssl.verifyhost = FALSE, # suppress certain SSL errors
+ ssl.verifypeer = FALSE,
+ nobody = TRUE, # perform HEAD request
+ verbose = FALSE
+ )
+ curlhandle <- getCurlHandle(.opts = opts)
+ getURL(uri, curl = curlhandle)
+ info <- getCurlInfo(curlhandle)
+ rm(curlhandle) # release the curlhandle!
+ info$effective.url
+ }
url <-
unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
Loading required package: RCurl Loading required package: bitops
url
x <- import.bw(url, as = 'RleList')
Error in seqinfo(ranges) : UCSC library operation failed In addition: Warning message: In seqinfo(ranges) : Couldn't open https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
x <-
import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
Error in seqinfo(ranges) : UCSC library operation failed In addition: Warning messages: 1: In seqinfo(ranges) : TCP non-blocking connect() to content-na.drive.amazonaws.com timed-out in select() after 10000 milliseconds - Cancelling! 2: In seqinfo(ranges) : Couldn't open http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
## Reproducibility info message(Sys.time())
2016-05-05 17:38:30
options(width = 120) devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------- setting value version R version 3.3.0 RC (2016-05-01 r70572) system x86_64, darwin13.4.0 ui X11 language (EN) collate en_US.UTF-8 tz America/New_York date 2016-05-05 Packages --------------------------------------------------------------------------------------------------------------- package * version date source Biobase 2.32.0 2016-05-04 Bioconductor BiocGenerics * 0.18.0 2016-05-04 Bioconductor BiocParallel 1.6.0 2016-05-04 Bioconductor Biostrings 2.40.0 2016-05-04 Bioconductor bitops * 1.0-6 2013-08-17 CRAN (R 3.3.0) colorout * 1.1-2 2016-05-05 Github (jalvesaq/colorout at 6538970) devtools 1.11.1 2016-04-21 CRAN (R 3.3.0) digest 0.6.9 2016-01-08 CRAN (R 3.3.0) GenomeInfoDb * 1.8.0 2016-05-04 Bioconductor GenomicAlignments 1.8.0 2016-05-04 Bioconductor GenomicRanges * 1.24.0 2016-05-04 Bioconductor IRanges * 2.6.0 2016-05-04 Bioconductor memoise 1.0.0 2016-01-29 CRAN (R 3.3.0) RCurl * 1.95-4.8 2016-03-01 CRAN (R 3.3.0) Rsamtools 1.24.0 2016-05-04 Bioconductor rtracklayer * 1.32.0 2016-05-05 Bioconductor S4Vectors * 0.10.0 2016-05-04 Bioconductor SummarizedExperiment 1.2.0 2016-05-04 Bioconductor withr 1.0.1 2016-02-04 CRAN (R 3.3.0) XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0) XVector 0.12.0 2016-05-04 Bioconductor zlibbioc 1.18.0 2016-05-04 Bioconductor
On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
The URL redirection is something I can try to add. For the other error, you need to get openssl installed and made visible to pkg-config, so that rtracklayer finds it during its build process. Michael On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres <lcollado at jhu.edu> wrote:
Hi Michael, I have a use case that is similar to https://support.bioconductor.org/p/81267/#82142 and looks to me like it might need some changes in rtracklayer to work. That's why I'm posting it here this time. Basically, I'm trying to use rtracklayer to import a bigwig file over the web which is in a different type of url than before. Using utils::download.file() with the defaults doesn't work, I have to use method = 'curl' and extra = '-L'. More specifically, the original url http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an effective url https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5 Now, using the second url with utils::download.file() and default methods also doesn't work. It does on the browser though. As you can see, downloading the file doesn't work out of the box. Which I guess that it's not surprising that using rtracklayer I get errors like: In seqinfo(ranges) : No openssl available in netConnectHttps for content-na.drive.amazonaws.com : 443 You can find further details (code and log file) at https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453 Thanks, Leo
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel