Issues with libcurl + HTTP status codes (eg. 403, 404)
Thanks for looking into this so promptly!
Should users expect the behaviour to be congruent across all of the
supported external programs (curl, wget) as well? E.g.
URL <- "http://cran.rstudio.org/no/such/file/here.tar.gz"
download <- function(file, method, ...)
print(download.file(file, destfile = tempfile(), method = method, ...))
download(URL, method = "internal") ## error
download(URL, method = "curl") ## status code 0
download(URL, method = "wget") ## warning (status code 8)
download(URL, method = "libcurl") ## status code 0
It seems unfortunate that the behaviour differs across each method; at
least in my mind `download.file()` should be a unified interface that
tries to do the 'same thing' regardless of the chosen method.
FWIW, one can force 'curl' to fail on HTTP error codes (-f) and this
can be passed down by R, e.g.
download(URL, method = "curl", extra = "-f") ## warning (status code 22)
but I still think this should be promoted to an error rather than a
warning. (Of course, changing that would imply a backwards
incompatible change; however, I think it would be the correct change).
(PS: I just tested r69197 and method = "libcurl" does indeed report an
error now in the above test case on my system [OS X]; thanks!)
Kevin
On Thu, Aug 27, 2015 at 10:27 AM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
R-devel r69197 returns appropriate errors for the cases below; I know of a few rough edges - ftp error codes are not reported correctly - download.file creates destfile before discovering that http fails, leaving an empty file on disk and am happy to hear of more. Martin On 08/27/2015 08:46 AM, Jeroen Ooms wrote:
On Thu, Aug 27, 2015 at 5:16 PM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Probably I'm confused now...
Both R-patched and R-devel give an error (after a *long* wait!)
for
download.file("https://someserver.com/mydata.csv", "mydata.csv")
So that problem is I think solved now.
I'm sorry for the confusion, this was a hypothetical example.
Connection failures are different from http status errors. Below some
real examples of servers returning http errors. For each example the
"internal" method correctly raises an R error, whereas the "libcurl"
method does not.
# File not found (404)
download.file("http://httpbin.org/data.csv", "data.csv", method =
"internal")
download.file("http://httpbin.org/data.csv", "data.csv", method =
"libcurl")
readLines(url("http://httpbin.org/data.csv", method = "internal"))
readLines(url("http://httpbin.org/data.csv", method = "libcurl"))
# Unauthorized (401)
download.file("https://httpbin.org/basic-auth/user/passwd",
"data.csv", method = "internal")
download.file("https://httpbin.org/basic-auth/user/passwd",
"data.csv", method = "libcurl")
readLines(url("https://httpbin.org/basic-auth/user/passwd", method =
"internal"))
readLines(url("https://httpbin.org/basic-auth/user/passwd", method =
"libcurl"))
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel