Skip to content
Prev 168357 / 398502 Next

RCurl unable to download a particular web page -- what is so special about this web page?

Thank you. The output i get from that example is below:
+          debugfunction = d$update, verbose = TRUE )
[1] ""
text
"About to connect() to uk.youtube.com port 80 (#0)\n  Trying
208.117.236.72... connected\nConnected to uk.youtube.com
(208.117.236.72) port 80 (#0)\nConnection #0 to host uk.youtube.com
left intact\n"
 
headerIn
"HTTP/1.1 400 Bad Request\r\nVia: 1.1 PFO-FIREWALL\r\nConnection: Keep-
Alive\r\nProxy-Connection: Keep-Alive\r\nTransfer-Encoding: chunked\r
\nExpires: Tue, 27 Apr 1971 19:44:06 EST\r\nDate: Tue, 27 Jan 2009
15:31:25 GMT\r\nContent-Type: text/plain\r\nServer: Apache\r\nX-
Content-Type-Options: nosniff\r\nCache-Control: no-cache\r
\nCneonction: close\r\n\r\n"
 
headerOut
"GET / HTTP/1.1\r\nHost: uk.youtube.com\r\nAccept: */*\r\n\r\n"
 
dataIn
"0\r\n\r\n"
 
dataOut
""
So the critical information from this is the '400 Bad Request'. A
Google search defines this for me as:

    The request could not be understood by the server due to malformed
    syntax. The client SHOULD NOT repeat the request without
modifications.


looking through sort(both listCurlOptions() and
http://curl.haxx.se/libcurl/c/curl_easy_setopt.htm) doesn't really
help me this time (unless i missed something). Any advice?

Thank you for your time,
C.C

P.S. I can get the d/l to work if i use:
[1] "<html>, \t<head>, \t\t<title>OpenDNS</title>, \t</head>, ,
\t<body id=\"mainbody\" onLoad=\"testforbanner();\" style=\"margin:
0px;\">, \t\t<script language=\"JavaScript\">, \t\t\tfunction
testforbanner() {, \t\t\t\tvar width;, \t\t\t\tvar height;, \t\t\t
\tvar x = 0;, \t\t\t\tvar isbanner = false;, \t\t\t\tvar bannersizes =
new Array(16), \t\t\t\tbannersizes[0] = [etc]

        
On 27 Jan, 13:52, Duncan Temple Lang <dun... at wald.ucdavis.edu> wrote: