On 7/28/06, Seth Falcon <sfalcon at fhcrc.org> wrote:
I have a rough draft patch, see below, that adds a User-Agent header
to HTTP requests made in R via download.file. If there is interest, I
will polish it.
It looks right, but I am running under Windows without a compiler.
I wonder if it would not be better to make the user agent string
something that is configurable (at the time R is built) rather than at
run time. This would make Seth's patch about 1% as long. Or this could
be handled as an option. The patches are pretty extensive and allow for
setting the agent header by setting parameters in function calls (eg
download.files). I am not sure there is a good use case for that level
of flexibility and the additional code is substantial.
The issue that I think arises is that there are potentially other
systems that will be unhappy with R's identification of itself and so
some users may also need to turn it off.
Any strong opinions?
James P. Howard, II wrote:
On 7/28/06, Seth Falcon <sfalcon at fhcrc.org> wrote:
I have a rough draft patch, see below, that adds a User-Agent header
to HTTP requests made in R via download.file. If there is interest, I
will polish it.
It looks right, but I am running under Windows without a compiler.
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
I wonder if it would not be better to make the user agent string
something that is configurable (at the time R is built) rather than at
run time. This would make Seth's patch about 1% as long. Or this could
be handled as an option. The patches are pretty extensive and allow for
setting the agent header by setting parameters in function calls (eg
download.files). I am not sure there is a good use case for that level
of flexibility and the additional code is substantial.
The issue that I think arises is that there are potentially other
systems that will be unhappy with R's identification of itself and so
some users may also need to turn it off.
I also thought that there was no need for this level of complexity.
(BTW, some of the patch is changes Seth has made for other purposes, e.g.
that to memory.c, so please no one apply all of it.)
I'd be happy for R to just identify itself as 'R', which seems allowed:
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html). But I am a bit
concerned that sites may not just require the field but also require a
particular format (even though W3C does not).
Any strong opinions?
James P. Howard, II wrote:
On 7/28/06, Seth Falcon <sfalcon at fhcrc.org> wrote:
I have a rough draft patch, see below, that adds a User-Agent header
to HTTP requests made in R via download.file. If there is interest, I
will polish it.
It looks right, but I am running under Windows without a compiler.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
I wonder if it would not be better to make the user agent string
something that is configurable (at the time R is built) rather than at
run time. This would make Seth's patch about 1% as long. Or this could
be handled as an option. The patches are pretty extensive and allow for
setting the agent header by setting parameters in function calls (eg
download.files). I am not sure there is a good use case for that level
of flexibility and the additional code is substantial.
The issue that I think arises is that there are potentially other
systems that will be unhappy with R's identification of itself and so
some users may also need to turn it off.
I also thought that there was no need for this level of complexity.
(BTW, some of the patch is changes Seth has made for other purposes, e.g.
that to memory.c, so please no one apply all of it.)
I'd be happy for R to just identify itself as 'R', which seems allowed:
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html). But I am a bit
concerned that sites may not just require the field but also require a
particular format (even though W3C does not).
For those of use that want to monitor downloads and get an idea of
the size of the user base for different platforms (which helps to
allocate resources) I think that we should try to include a bit more
information.
I could probably live with as little as R version, but would like to
have OS there as well...
best wishes
Robert
Any strong opinions?
James P. Howard, II wrote:
On 7/28/06, Seth Falcon <sfalcon at fhcrc.org> wrote:
I have a rough draft patch, see below, that adds a User-Agent header
to HTTP requests made in R via download.file. If there is interest, I
will polish it.
It looks right, but I am running under Windows without a compiler.
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
On 7/28/06, Robert Gentleman <rgentlem at fhcrc.org> wrote:
I wonder if it would not be better to make the user agent string
something that is configurable (at the time R is built) rather than at
run time. This would make Seth's patch about 1% as long. Or this could
be handled as an option. The patches are pretty extensive and allow for
setting the agent header by setting parameters in function calls (eg
download.files). I am not sure there is a good use case for that level
of flexibility and the additional code is substantial.
The issue that I think arises is that there are potentially other
systems that will be unhappy with R's identification of itself and so
some users may also need to turn it off.
Any strong opinions?
Actually two:
1) If you wish to pull down (read extract from HTML or similar) live
data from the web, you might want to be able to "immitate" a certain
browser. For instance, if you tell some webserver you're a simple
"mobile phone" or "lynx", you might be able get back very clean data.
Some servers might also block unknown web browsers.
2) If the webserver of a package reprocitory decided to make use of
the user-agent string to decide what version of the reprocitory it
should deliver, I would like to be able to trick the server. Why?
Many times I found myself working on a system where I do not have the
rights to update to the latest or the developers version of R.
However, although I have not the very latest version of R you can do
work. For instance, in Bioconductor the biocLite() & co gives you
either the stable or the developers of Bioconductor depending on your
R version, but looking into the biocLite() code and beyond, you find
that you actually can install a Bioconductor v1.9 package in R v2.3.1.
It can be risky business, but if you know what you're doing, it can
save your day (or week).
Cheers
Henrik
James P. Howard, II wrote:
On 7/28/06, Seth Falcon <sfalcon at fhcrc.org> wrote:
I have a rough draft patch, see below, that adds a User-Agent header
to HTTP requests made in R via download.file. If there is interest, I
will polish it.
It looks right, but I am running under Windows without a compiler.
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
OK, that suggests setting at the options level would solve both of your
problems and that seems like the best approach. I don't really want to
pass this around as a parameter through the maze of functions that might
actually download something if we don't have to.
I think we can provide something early next week on R-devel for folks to
test. But I suspect that as Henrik also does, the set of sites that will
refuse us with a User-Agent header will be much larger than those that
James has found that refuse us without it.
best wishes
Robert
Henrik Bengtsson wrote:
On 7/28/06, Robert Gentleman <rgentlem at fhcrc.org> wrote:
I wonder if it would not be better to make the user agent string
something that is configurable (at the time R is built) rather than at
run time. This would make Seth's patch about 1% as long. Or this could
be handled as an option. The patches are pretty extensive and allow for
setting the agent header by setting parameters in function calls (eg
download.files). I am not sure there is a good use case for that level
of flexibility and the additional code is substantial.
The issue that I think arises is that there are potentially other
systems that will be unhappy with R's identification of itself and so
some users may also need to turn it off.
Any strong opinions?
Actually two:
1) If you wish to pull down (read extract from HTML or similar) live
data from the web, you might want to be able to "immitate" a certain
browser. For instance, if you tell some webserver you're a simple
"mobile phone" or "lynx", you might be able get back very clean data.
Some servers might also block unknown web browsers.
2) If the webserver of a package reprocitory decided to make use of
the user-agent string to decide what version of the reprocitory it
should deliver, I would like to be able to trick the server. Why?
Many times I found myself working on a system where I do not have the
rights to update to the latest or the developers version of R.
However, although I have not the very latest version of R you can do
work. For instance, in Bioconductor the biocLite() & co gives you
either the stable or the developers of Bioconductor depending on your
R version, but looking into the biocLite() code and beyond, you find
that you actually can install a Bioconductor v1.9 package in R v2.3.1.
It can be risky business, but if you know what you're doing, it can
save your day (or week).
Cheers
Henrik
James P. Howard, II wrote:
On 7/28/06, Seth Falcon <sfalcon at fhcrc.org> wrote:
I have a rough draft patch, see below, that adds a User-Agent header
to HTTP requests made in R via download.file. If there is interest, I
will polish it.
It looks right, but I am running under Windows without a compiler.
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
I also thought that there was no need for this level of complexity.
(BTW, some of the patch is changes Seth has made for other purposes, e.g.
that to memory.c, so please no one apply all of it.)
*blush* sorry about that. I made the final diff from a src tree on
another machine and it was dirty. memory.c should NOT have been
touched by my patch.
I'd be happy for R to just identify itself as 'R', which seems allowed:
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html). But I am a bit
concerned that sites may not just require the field but also require a
particular format (even though W3C does not).
As long as it is going to identify itself, I think there is value in
having it provide version, platform, OS info.
Given the concern that some sites that currently work may stop working
(example?), perhaps making this a global option is a good compromise.
The option could be httpRequestHeader and the default value would be
as proposed in my patch. A NULL value would result in the current
behavior, no extra header info in the request.
OK, that suggests setting at the options level would solve both of your
problems and that seems like the best approach. I don't really want to
pass this around as a parameter through the maze of functions that might
actually download something if we don't have to.
I have an updated patch that adds an HTTPUserAgent option. The
default is a string like:
R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu)
If the HTTPUserAgent option is NULL, no user agent header is added to
HTTP requests (this is the current behavior). This option allows R to
use an arbitrary user agent header.
The patch adds two non-exported functions to utils:
1) defaultUserAgent - returns a string like above
2) makeUserAgent - formats content of HTTPUserAgent option for use
as part of an HTTP request header.
I've tested on OSX and Linux, but not on Windows. When USE_WININET is
defined, a user agent string of "R" was already being used. With this
patch, the HTTPUserAgent options is used. I'm unsure if NULL is
allowed.
Also, in src/main/internet.c there is a comment:
"Next 6 are for use by libxml, only"
and then a definition for R_HTTPOpen. Not sure how/when these get
used. The user agent for these calls remains unspecified with this
patch.
+ seth
Patch summary:
src/include/R_ext/R-ftp-http.h | 2 +-
src/include/Rmodules/Rinternet.h | 2 +-
src/library/base/man/options.Rd | 5 +++++
src/library/utils/R/readhttp.R | 25 +++++++++++++++++++++++++
src/library/utils/R/zzz.R | 3 ++-
src/main/internet.c | 2 +-
src/modules/internet/internet.c | 37 +++++++++++++++++++++++++------------
src/modules/internet/nanohttp.c | 8 ++++++--
8 files changed, 66 insertions(+), 18 deletions(-)
Index: src/include/R_ext/R-ftp-http.h
===================================================================
--- src/include/R_ext/R-ftp-http.h (revision 38715)
+++ src/include/R_ext/R-ftp-http.h (working copy)
@@ -36,7 +36,7 @@
int R_FTPRead(void *ctx, char *dest, int len);
void R_FTPClose(void *ctx);
-void * RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK);
+void * RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers, int cacheOK);
int RxmlNanoHTTPRead(void *ctx, void *dest, int len);
void RxmlNanoHTTPClose(void *ctx);
int RxmlNanoHTTPReturnCode(void *ctx);
Index: src/include/Rmodules/Rinternet.h
===================================================================
--- src/include/Rmodules/Rinternet.h (revision 38715)
+++ src/include/Rmodules/Rinternet.h (working copy)
@@ -9,7 +9,7 @@
typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode);
typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, char *mode);
-typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK);
+typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, const int cacheOK);
typedef int (*R_HTTPReadRoutine)(void *ctx, char *dest, int len);
typedef void (*R_HTTPCloseRoutine)(void *ctx);
Index: src/main/internet.c
===================================================================
--- src/main/internet.c (revision 38715)
+++ src/main/internet.c (working copy)
@@ -129,7 +129,7 @@
{
if(!initialized) internet_Init();
if(initialized > 0)
- return (*ptr->HTTPOpen)(url, 0);
+ return (*ptr->HTTPOpen)(url, NULL, 0);
else {
error(_("internet routines cannot be loaded"));
return NULL;
Index: src/library/utils/R/zzz.R
===================================================================
--- src/library/utils/R/zzz.R (revision 38715)
+++ src/library/utils/R/zzz.R (working copy)
@@ -9,7 +9,8 @@
internet.info = 2,
pkgType = .Platform$pkgType,
str = list(strict.width = "no"),
- example.ask = "default")
+ example.ask = "default",
+ HTTPUserAgent = defaultUserAgent())
extra <-
if(.Platform$OS.type == "windows") {
list(mailer = "none",
Index: src/library/utils/R/readhttp.R
===================================================================
--- src/library/utils/R/readhttp.R (revision 38715)
+++ src/library/utils/R/readhttp.R (working copy)
@@ -6,3 +6,28 @@
stop("transfer failure")
file.show(file, delete.file = delete.file, title = title, ...)
}
+
+
+
+defaultUserAgent <- function()
+{
+ Rver <- paste(R.version$major, R.version$minor, sep=".")
+ Rdetails <- paste(Rver, R.version$platform, R.version$arch,
+ R.version$os)
+ paste("R (", Rdetails, ")", sep="")
+}
+
+
+makeUserAgent <- function(format = TRUE) {
+ agent <- getOption("HTTPUserAgent")
+ if (is.null(agent)) {
+ return(NULL)
+ }
+ if (length(agent) != 1)
+ stop(sQuote("HTTPUserAgent"),
+ " option must be a length one character vector or NULL")
+ if (format)
+ paste("User-Agent: ", agent[1], "\r\n", sep = "")
+ else
+ agent[1]
+}
Index: src/library/base/man/options.Rd
===================================================================
--- src/library/base/man/options.Rd (revision 38715)
+++ src/library/base/man/options.Rd (working copy)
@@ -368,6 +368,11 @@
\item{\code{help.try.all.packages}:}{default for an argument of
\code{\link{help}}.}
+ \item{\code{HTTPUserAgent}:}{string used as the user agent in HTTP
+ requests. If \code{NULL}, HTTP requests will be made without a
+ user agent header. The default is \code{R (<version> <platform>
+ <arch> <os>)}}
+
\item{\code{internet.info}:}{The minimum level of information to be
printed on URL downloads etc. Default is 2, for failure causes.
Set to 1 or 0 to get more information.}
Index: src/modules/internet/internet.c
===================================================================
--- src/modules/internet/internet.c (revision 38715)
+++ src/modules/internet/internet.c (working copy)
@@ -28,7 +28,7 @@
#include <Rconnections.h>
#include <R_ext/R-ftp-http.h>
-static void *in_R_HTTPOpen(const char *url, const int cacheOK);
+static void *in_R_HTTPOpen(const char *url, const char *headers, const int cacheOK);
static int in_R_HTTPRead(void *ctx, char *dest, int len);
static void in_R_HTTPClose(void *ctx);
@@ -70,7 +70,7 @@
switch(type) {
case HTTPsh:
- ctxt = in_R_HTTPOpen(url, 0);
+ ctxt = in_R_HTTPOpen(url, NULL, 0);
if(ctxt == NULL) {
/* if we call error() we get a connection leak*/
/* so do_url has to raise the error*/
@@ -238,14 +238,14 @@
}
#endif
-/* download(url, destfile, quiet, mode, cacheOK) */
+/* download(url, destfile, quiet, mode, headers, cacheOK) */
#define CPBUFSIZE 65536
#define IBUFSIZE 4096
static SEXP in_do_download(SEXP call, SEXP op, SEXP args, SEXP env)
{
- SEXP ans, scmd, sfile, smode;
- char *url, *file, *mode;
+ SEXP ans, scmd, sfile, smode, sheaders, agentFun;
+ char *url, *file, *mode, *headers;
int quiet, status = 0, cacheOK;
checkArity(op, args);
@@ -271,6 +271,17 @@
cacheOK = asLogical(CAR(args));
if(cacheOK == NA_LOGICAL)
error(_("invalid '%s' argument"), "cacheOK");
+#ifdef USE_WININET
+ PROTECT(agentFun = lang2(install("makeUserAgent"), ScalarLogical(0)));
+#else
+ PROTECT(agentFun = lang1(install("makeUserAgent")));
+#endif
+ PROTECT(sheaders = eval(agentFun, R_FindNamespace(mkString("utils"))));
+ UNPROTECT(1);
+ if(TYPEOF(sheaders) == NILSXP)
+ headers = NULL;
+ else
+ headers = CHAR(STRING_ELT(sheaders, 0));
#ifdef Win32
if (!pbar.wprog) {
pbar.wprog = newwindow(_("Download progress"), rect(0, 0, 540, 100),
@@ -319,7 +330,7 @@
#ifdef Win32
R_FlushConsole();
#endif
- ctxt = in_R_HTTPOpen(url, cacheOK);
+ ctxt = in_R_HTTPOpen(url, headers, cacheOK);
if(ctxt == NULL) status = 1;
else {
if(!quiet) REprintf(_("opened URL\n"), url);
@@ -466,14 +477,14 @@
PROTECT(ans = allocVector(INTSXP, 1));
INTEGER(ans)[0] = status;
- UNPROTECT(1);
+ UNPROTECT(2);
return ans;
}
#if defined(SUPPORT_LIBXML) && !defined(USE_WININET)
-void *in_R_HTTPOpen(const char *url, int cacheOK)
+void *in_R_HTTPOpen(const char *url, const char *headers, const int cacheOK)
{
inetconn *con;
void *ctxt;
@@ -484,7 +495,7 @@
if(timeout == NA_INTEGER || timeout <= 0) timeout = 60;
RxmlNanoHTTPTimeout(timeout);
- ctxt = RxmlNanoHTTPOpen(url, NULL, cacheOK);
+ ctxt = RxmlNanoHTTPOpen(url, NULL, headers, cacheOK);
if(ctxt != NULL) {
int rc = RxmlNanoHTTPReturnCode(ctxt);
if(rc != 200) {
@@ -605,7 +616,8 @@
}
#endif /* USE_WININET_ASYNC */
-static void *in_R_HTTPOpen(const char *url, const int cacheOK)
+static void *in_R_HTTPOpen(const char *url, const char *headers,
+ const int cacheOK)
{
WIctxt wictxt;
DWORD status, d1 = 4, d2 = 0, d3 = 100;
@@ -622,7 +634,7 @@
wictxt->length = -1;
wictxt->type = NULL;
wictxt->hand =
- InternetOpen("R", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL,
+ InternetOpen(headers, INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL,
#ifdef USE_WININET_ASYNC
INTERNET_FLAG_ASYNC
#else
@@ -870,7 +882,8 @@
#endif
#ifndef HAVE_INTERNET
-static void *in_R_HTTPOpen(const char *url, const int cacheOK)
+static void *in_R_HTTPOpen(const char *url, const char *headers,
+ const int cacheOK)
{
return NULL;
}
Index: src/modules/internet/nanohttp.c
===================================================================
--- src/modules/internet/nanohttp.c (revision 38715)
+++ src/modules/internet/nanohttp.c (working copy)
@@ -1034,6 +1034,9 @@
* @contentType: if available the Content-Type information will be
* returned at that location
*
+ * @headers: headers to be used in the HTTP request. These must be name/value
+ * pairs separated by ':', each on their own line.
+ *
* This function try to open a connection to the indicated resource
* via HTTP GET.
*
@@ -1042,10 +1045,11 @@
*/
void*
-RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK)
+RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers,
+ int cacheOK)
{
if (contentType != NULL) *contentType = NULL;
- return RxmlNanoHTTPMethod(URL, NULL, NULL, contentType, NULL, cacheOK);
+ return RxmlNanoHTTPMethod(URL, NULL, NULL, contentType, headers, cacheOK);
}
/**
should appear at an R-devel near you...
thanks Seth
Seth Falcon wrote:
Robert Gentleman <rgentlem at fhcrc.org> writes:
OK, that suggests setting at the options level would solve both of your
problems and that seems like the best approach. I don't really want to
pass this around as a parameter through the maze of functions that might
actually download something if we don't have to.
I have an updated patch that adds an HTTPUserAgent option. The
default is a string like:
R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu)
If the HTTPUserAgent option is NULL, no user agent header is added to
HTTP requests (this is the current behavior). This option allows R to
use an arbitrary user agent header.
The patch adds two non-exported functions to utils:
1) defaultUserAgent - returns a string like above
2) makeUserAgent - formats content of HTTPUserAgent option for use
as part of an HTTP request header.
I've tested on OSX and Linux, but not on Windows. When USE_WININET is
defined, a user agent string of "R" was already being used. With this
patch, the HTTPUserAgent options is used. I'm unsure if NULL is
allowed.
Also, in src/main/internet.c there is a comment:
"Next 6 are for use by libxml, only"
and then a definition for R_HTTPOpen. Not sure how/when these get
used. The user agent for these calls remains unspecified with this
patch.
+ seth
Patch summary:
src/include/R_ext/R-ftp-http.h | 2 +-
src/include/Rmodules/Rinternet.h | 2 +-
src/library/base/man/options.Rd | 5 +++++
src/library/utils/R/readhttp.R | 25 +++++++++++++++++++++++++
src/library/utils/R/zzz.R | 3 ++-
src/main/internet.c | 2 +-
src/modules/internet/internet.c | 37 +++++++++++++++++++++++++------------
src/modules/internet/nanohttp.c | 8 ++++++--
8 files changed, 66 insertions(+), 18 deletions(-)
Index: src/include/R_ext/R-ftp-http.h
===================================================================
--- src/include/R_ext/R-ftp-http.h (revision 38715)
+++ src/include/R_ext/R-ftp-http.h (working copy)
@@ -36,7 +36,7 @@
int R_FTPRead(void *ctx, char *dest, int len);
void R_FTPClose(void *ctx);
-void * RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK);
+void * RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers, int cacheOK);
int RxmlNanoHTTPRead(void *ctx, void *dest, int len);
void RxmlNanoHTTPClose(void *ctx);
int RxmlNanoHTTPReturnCode(void *ctx);
Index: src/include/Rmodules/Rinternet.h
===================================================================
--- src/include/Rmodules/Rinternet.h (revision 38715)
+++ src/include/Rmodules/Rinternet.h (working copy)
@@ -9,7 +9,7 @@
typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode);
typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, char *mode);
-typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK);
+typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, const int cacheOK);
typedef int (*R_HTTPReadRoutine)(void *ctx, char *dest, int len);
typedef void (*R_HTTPCloseRoutine)(void *ctx);
Index: src/main/internet.c
===================================================================
--- src/main/internet.c (revision 38715)
+++ src/main/internet.c (working copy)
@@ -129,7 +129,7 @@
{
if(!initialized) internet_Init();
if(initialized > 0)
- return (*ptr->HTTPOpen)(url, 0);
+ return (*ptr->HTTPOpen)(url, NULL, 0);
else {
error(_("internet routines cannot be loaded"));
return NULL;
Index: src/library/utils/R/zzz.R
===================================================================
--- src/library/utils/R/zzz.R (revision 38715)
+++ src/library/utils/R/zzz.R (working copy)
@@ -9,7 +9,8 @@
internet.info = 2,
pkgType = .Platform$pkgType,
str = list(strict.width = "no"),
- example.ask = "default")
+ example.ask = "default",
+ HTTPUserAgent = defaultUserAgent())
extra <-
if(.Platform$OS.type == "windows") {
list(mailer = "none",
Index: src/library/utils/R/readhttp.R
===================================================================
--- src/library/utils/R/readhttp.R (revision 38715)
+++ src/library/utils/R/readhttp.R (working copy)
@@ -6,3 +6,28 @@
stop("transfer failure")
file.show(file, delete.file = delete.file, title = title, ...)
}
+
+
+
+defaultUserAgent <- function()
+{
+ Rver <- paste(R.version$major, R.version$minor, sep=".")
+ Rdetails <- paste(Rver, R.version$platform, R.version$arch,
+ R.version$os)
+ paste("R (", Rdetails, ")", sep="")
+}
+
+
+makeUserAgent <- function(format = TRUE) {
+ agent <- getOption("HTTPUserAgent")
+ if (is.null(agent)) {
+ return(NULL)
+ }
+ if (length(agent) != 1)
+ stop(sQuote("HTTPUserAgent"),
+ " option must be a length one character vector or NULL")
+ if (format)
+ paste("User-Agent: ", agent[1], "\r\n", sep = "")
+ else
+ agent[1]
+}
Index: src/library/base/man/options.Rd
===================================================================
--- src/library/base/man/options.Rd (revision 38715)
+++ src/library/base/man/options.Rd (working copy)
@@ -368,6 +368,11 @@
\item{\code{help.try.all.packages}:}{default for an argument of
\code{\link{help}}.}
+ \item{\code{HTTPUserAgent}:}{string used as the user agent in HTTP
+ requests. If \code{NULL}, HTTP requests will be made without a
+ user agent header. The default is \code{R (<version> <platform>
+ <arch> <os>)}}
+
\item{\code{internet.info}:}{The minimum level of information to be
printed on URL downloads etc. Default is 2, for failure causes.
Set to 1 or 0 to get more information.}
Index: src/modules/internet/internet.c
===================================================================
--- src/modules/internet/internet.c (revision 38715)
+++ src/modules/internet/internet.c (working copy)
@@ -28,7 +28,7 @@
#include <Rconnections.h>
#include <R_ext/R-ftp-http.h>
-static void *in_R_HTTPOpen(const char *url, const int cacheOK);
+static void *in_R_HTTPOpen(const char *url, const char *headers, const int cacheOK);
static int in_R_HTTPRead(void *ctx, char *dest, int len);
static void in_R_HTTPClose(void *ctx);
@@ -70,7 +70,7 @@
switch(type) {
case HTTPsh:
- ctxt = in_R_HTTPOpen(url, 0);
+ ctxt = in_R_HTTPOpen(url, NULL, 0);
if(ctxt == NULL) {
/* if we call error() we get a connection leak*/
/* so do_url has to raise the error*/
@@ -238,14 +238,14 @@
}
#endif
-/* download(url, destfile, quiet, mode, cacheOK) */
+/* download(url, destfile, quiet, mode, headers, cacheOK) */
#define CPBUFSIZE 65536
#define IBUFSIZE 4096
static SEXP in_do_download(SEXP call, SEXP op, SEXP args, SEXP env)
{
- SEXP ans, scmd, sfile, smode;
- char *url, *file, *mode;
+ SEXP ans, scmd, sfile, smode, sheaders, agentFun;
+ char *url, *file, *mode, *headers;
int quiet, status = 0, cacheOK;
checkArity(op, args);
@@ -271,6 +271,17 @@
cacheOK = asLogical(CAR(args));
if(cacheOK == NA_LOGICAL)
error(_("invalid '%s' argument"), "cacheOK");
+#ifdef USE_WININET
+ PROTECT(agentFun = lang2(install("makeUserAgent"), ScalarLogical(0)));
+#else
+ PROTECT(agentFun = lang1(install("makeUserAgent")));
+#endif
+ PROTECT(sheaders = eval(agentFun, R_FindNamespace(mkString("utils"))));
+ UNPROTECT(1);
+ if(TYPEOF(sheaders) == NILSXP)
+ headers = NULL;
+ else
+ headers = CHAR(STRING_ELT(sheaders, 0));
#ifdef Win32
if (!pbar.wprog) {
pbar.wprog = newwindow(_("Download progress"), rect(0, 0, 540, 100),
@@ -319,7 +330,7 @@
#ifdef Win32
R_FlushConsole();
#endif
- ctxt = in_R_HTTPOpen(url, cacheOK);
+ ctxt = in_R_HTTPOpen(url, headers, cacheOK);
if(ctxt == NULL) status = 1;
else {
if(!quiet) REprintf(_("opened URL\n"), url);
@@ -466,14 +477,14 @@
PROTECT(ans = allocVector(INTSXP, 1));
INTEGER(ans)[0] = status;
- UNPROTECT(1);
+ UNPROTECT(2);
return ans;
}
#if defined(SUPPORT_LIBXML) && !defined(USE_WININET)
-void *in_R_HTTPOpen(const char *url, int cacheOK)
+void *in_R_HTTPOpen(const char *url, const char *headers, const int cacheOK)
{
inetconn *con;
void *ctxt;
@@ -484,7 +495,7 @@
if(timeout == NA_INTEGER || timeout <= 0) timeout = 60;
RxmlNanoHTTPTimeout(timeout);
- ctxt = RxmlNanoHTTPOpen(url, NULL, cacheOK);
+ ctxt = RxmlNanoHTTPOpen(url, NULL, headers, cacheOK);
if(ctxt != NULL) {
int rc = RxmlNanoHTTPReturnCode(ctxt);
if(rc != 200) {
@@ -605,7 +616,8 @@
}
#endif /* USE_WININET_ASYNC */
-static void *in_R_HTTPOpen(const char *url, const int cacheOK)
+static void *in_R_HTTPOpen(const char *url, const char *headers,
+ const int cacheOK)
{
WIctxt wictxt;
DWORD status, d1 = 4, d2 = 0, d3 = 100;
@@ -622,7 +634,7 @@
wictxt->length = -1;
wictxt->type = NULL;
wictxt->hand =
- InternetOpen("R", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL,
+ InternetOpen(headers, INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL,
#ifdef USE_WININET_ASYNC
INTERNET_FLAG_ASYNC
#else
@@ -870,7 +882,8 @@
#endif
#ifndef HAVE_INTERNET
-static void *in_R_HTTPOpen(const char *url, const int cacheOK)
+static void *in_R_HTTPOpen(const char *url, const char *headers,
+ const int cacheOK)
{
return NULL;
}
Index: src/modules/internet/nanohttp.c
===================================================================
--- src/modules/internet/nanohttp.c (revision 38715)
+++ src/modules/internet/nanohttp.c (working copy)
@@ -1034,6 +1034,9 @@
* @contentType: if available the Content-Type information will be
* returned at that location
*
+ * @headers: headers to be used in the HTTP request. These must be name/value
+ * pairs separated by ':', each on their own line.
+ *
* This function try to open a connection to the indicated resource
* via HTTP GET.
*
@@ -1042,10 +1045,11 @@
*/
void*
-RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK)
+RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers,
+ int cacheOK)
{
if (contentType != NULL) *contentType = NULL;
- return RxmlNanoHTTPMethod(URL, NULL, NULL, contentType, NULL, cacheOK);
+ return RxmlNanoHTTPMethod(URL, NULL, NULL, contentType, headers, cacheOK);
}
/**
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org