Skip to content
Back to formatted view

Raw Message

Message-ID: <abcf95a7-e5e3-2357-5c70-a67b7d753798@upf.edu>
Date: 2023-03-29T20:08:54Z
From: Robert Castelo
Subject: [Bioc-devel] httr::GET() problem downloading a ExperimentHub resource
In-Reply-To: <BY5PR04MB6627A2C801A4609E3393BB77F9899@BY5PR04MB6627.namprd04.prod.outlook.com>

good catch, but really enigmatic, BAI files work, but BAM don't:

dat <- 
read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv")
rdatapath <- strsplit(dat$RDataPath, ":")
bamfiles <- unlist(rdatapath)[seq(1, 18, 2)]
baifiles <- unlist(rdatapath)[seq(2, 18, 2)]

bamurls <- paste0(dat$Location_Prefix, bamfiles)
baiurls <- paste0(dat$Location_Prefix, baifiles)

## BAM files give error
for (bf in bamurls) {
 ? cat(sprintf("%s\n", basename(bf)))
 ? tryCatch({
 ??? curl::curl_fetch_disk(bf, tempfile())
 ? }, error=function(e) message(paste0(e, "\n")))
}

## BAI files do not give error
for (bf in baiurls) {
 ? cat(sprintf("%s\n", basename(bf)))
 ? tryCatch({
 ??? curl::curl_fetch_disk(bf, tempfile())
 ? }, error=function(e) message(paste0(e, "\n")))
}

any further idea??

robert.

On 29/3/23 21:10, Martin Morgan wrote:
>
> Not really helpful but this could be simplified a bit by removing the 
> redirect from experiment hub, and the layer from httr to curl, so
>
> url = 
> "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"
>
> curl::curl_fetch_disk(url, tempfile())
>
> Error in 
> curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam", 
> :
>
> ? Failed writing received data to disk/application
>
> I notice the index file (extension .bai) works; do other BAM files 
> work, too?
>
> Martin
>
> *From: *Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of 
> Robert Castelo <robert.castelo at upf.edu>
> *Date: *Wednesday, March 29, 2023 at 1:18 PM
> *To: *bioc-devel at r-project.org <bioc-devel at r-project.org>
> *Subject: *[Bioc-devel] httr::GET() problem downloading a 
> ExperimentHub resource
>
> hi,
>
> we recently added a few new ExperimentHub resources, consisting of BAM
> files and their corresponding BAI files and hosted in my own server.
> while it seems that they are accessible, they cannot be downloaded
> through the ExperimentHub API. the minimum example reproducing the
> problem is this one (using Bioc devel):
>
> library(ExperimentHub)
> httr::GET("https://experimenthub.bioconductor.org/fetch/8129")
> Error in curl::curl_fetch_memory(url, handle = handle) :
> ?? Failed writing received data to disk/application
>
> while there's apparently no problem to "manually" download the resource
> using 'download.file()' and loading it with
> 'GenomicAlignments::readGAlignments()':
>
> download.file("https://experimenthub.bioconductor.org/fetch/8129",
> "file.bam")
> trying URL 'https://experimenthub.bioconductor.org/fetch/8129'
> Content type 'application/octet-stream' length 13296358 bytes (12.7 MB)
> ==================================================
> downloaded 12.7 MB
>
> gal <- GenomicAlignments::readGAlignments("file.bam")
> gal[1:3]
> GAlignments object with 3 alignments and 0 metadata columns:
> ?????? seqnames strand?????? cigar??? qwidth???? start end???? width
> ????????? <Rle>? <Rle> <character> <integer> <integer> <integer> <integer>
> ?? [1]???? chr1????? +?????? 49M1S??????? 50???? 16208 16256??????? 49
> ?? [2]???? chr1????? +?????? 3S47M??????? 50???? 16976 17022??????? 47
> ?? [3]???? chr1????? -? 10M177N40M??????? 50???? 17046 17272?????? 227
> ?????????? njunc
> ?????? <integer>
> ?? [1]???????? 0
> ?? [2]???????? 0
> ?? [3]???????? 1
> ?? -------
> ?? seqinfo: 2580 sequences from an unspecified genome
>
> any hint why 'httr::GET()' fails, while 'download.file()' doesn't?
>
> thanks!!
>
> robert.
> ps: just to clarify, the 'httr::GET()' example is behind the following
> problem:
>
> eh <- ExperimentHub()
> z <- eh[["EH8079"]]
> see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
> documentation
> downloading 2 resources
> retrieving 2 resources
> |======================================================================|
> 100%
>
> Error: failed to load resource
> ?? name: EH8079
> ?? title: RNA-seq data BAM file subset of HRR589632 contaminated with 0%
> gDNA
> ?? reason: 1 resources failed to download
> In addition: Warning messages:
> 1: download failed
> ?? web resource path:
> ?https://experimenthub.bioconductor.org/fetch/8129?
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
> ?? local file path: ?/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129?
> ?? reason: Failed writing received data to disk/application
> 2: bfcadd() failed; resource removed
> ?? rid: BFC3
> ?? fpath: ?https://experimenthub.bioconductor.org/fetch/8129?
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
> ?? reason: download failed
> 3: download failed
> ?? hub path: ?https://experimenthub.bioconductor.org/fetch/8129?
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
> ?? cache resource: ?EH8079 : 8129?
> ?? reason: bfcadd() failed; see warnings()
>
>
> ??????? [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514

	[[alternative HTML version deleted]]