An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110429/57cdd362/attachment.pl>
read.csv fails to read a CSV file from google docs
8 messages · Tal Galili, David Winsemius, William Dunlap +2 more
On Apr 29, 2011, at 11:19 AM, Tal Galili wrote:
Hello all, I wish to use read.csv to read a google doc spreadsheet. I try using the following code: data_url <- " http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv " read.csv(data_url) Which results in the following error: Error in file(file, "rt") : cannot open the connection I'm on windows 7. And the code was tried on R 2.12 and 2.13 I remember trying this a few months ago and it worked fine.
I am always amused at such claims. Occasionally they are correct, but
more often a crucial step has been omitted. In this case you have at a
minimum embedded line-feeds in your URL string and have not
established a connection, so it could not possibly have succeeded as
presented.
But now it's time to admit I do not know why it is not succeeding when
I correct those flaws.
> closeAllConnections()
> data_url <- url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
")
> read.csv(data_url)
Error in open.connection(file, "rt") : cannot open the connection
> closeAllConnections()
> dd <- read.csv(con <- url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
"))
Error in open.connection(file, "rt") : cannot open the connection
So, I guess I'm not reading the help pages for `url` and `read.csv` as
well I thought I was.
Any suggestion what might be causing this or how to solve it?
David Winsemius, MD West Hartford, CT
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Friday, April 29, 2011 10:36 AM To: Tal Galili Cc: r-help at r-project.org Subject: Re: [R] read.csv fails to read a CSV file from google docs On Apr 29, 2011, at 11:19 AM, Tal Galili wrote:
Hello all, I wish to use read.csv to read a google doc spreadsheet. I try using the following code: data_url <- "
http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&ke y=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid =0&output=csv
" read.csv(data_url) Which results in the following error: Error in file(file, "rt") : cannot open the connection
With S+ I get:
S+>
download.file("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=
en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&ou
tput=csv", destfile="e:/temp/splus")
Problem in
download.file("http://spreadsheets0.google.com/spreadsheet/pu..: Could
not get url: un
supported protocol, libcurl was built with SSL disabled, https: not
supported!
and with cygwin's wget I get
E:\temp\jnk>wget
"http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDT
Vek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&outpu
t=csv"
--2011-04-29 11:00:10--
http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTV
ek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=
0&output=csv
Resolving spreadsheets0.google.com... 74.125.224.73, 74.125.224.71,
74.125.224.64, ...
Connecting to spreadsheets0.google.com|74.125.224.73|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location:
https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDT
Vek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [
following]
--2011-04-29 11:00:11--
https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDT
Vek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid
=0&output=csv
Connecting to spreadsheets0.google.com|74.125.224.73|:443... connected.
ERROR: cannot verify spreadsheets0.google.com's certificate, issued by
`/C=US/O=Google Inc/CN=Google Internet Authority':
Unable to locally verify the issuer's authority.
To connect to spreadsheets0.google.com insecurely, use
`--no-check-certificate'.
Unable to establish SSL connection.
so I suspect that the SLL/certifcate business may also be the problem
when
using R to get the document. The R error message is not very
illuminating.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
I'm on windows 7. And the code was tried on R 2.12 and 2.13 I remember trying this a few months ago and it worked fine.
I am always amused at such claims. Occasionally they are correct, but more often a crucial step has been omitted. In this case you have at a minimum embedded line-feeds in your URL string and have not established a connection, so it could not possibly have succeeded as presented. But now it's time to admit I do not know why it is not succeeding when I correct those flaws.
> closeAllConnections() > data_url <-
url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=
en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=tru
e&gid=0&output=csv
")
> read.csv(data_url)
Error in open.connection(file, "rt") : cannot open the connection
> closeAllConnections() > dd <- read.csv(con <-
url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=
en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=tru
e&gid=0&output=csv
"))
Error in open.connection(file, "rt") : cannot open the connection
So, I guess I'm not reading the help pages for `url` and
`read.csv` as
well I thought I was.
Any suggestion what might be causing this or how to solve it?
-- David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks David for fixing the early issues. The reason for the failure is that the response from the Web server is a to redirect the requester to another page, specifically https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv Note that this is https, not http, and the built-in URL reading facilities in R don't suport https. One way to see this is to use look at the headers in your browser (e.g. Live HTTP Headers), or to use curl, or the RCurl package tt = getForm("http://spreadsheets0.google.com/spreadsheet/pub", hl ="en", key = "0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE", single = "true", gid ="0", output = "csv", .opts = list(followlocation = TRUE, verbose = TRUE)) The verbose option shows the entire dialog, and tt contains the text of the CSV document. read.csv(textConnection(tt)) then yields the data frame D.
On 4/29/11 10:36 AM, David Winsemius wrote:
On Apr 29, 2011, at 11:19 AM, Tal Galili wrote:
Hello all, I wish to use read.csv to read a google doc spreadsheet. I try using the following code: data_url <- " http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv " read.csv(data_url) Which results in the following error: Error in file(file, "rt") : cannot open the connection I'm on windows 7. And the code was tried on R 2.12 and 2.13 I remember trying this a few months ago and it worked fine.
I am always amused at such claims. Occasionally they are correct, but more often a crucial step has been omitted. In this case you have at a minimum embedded line-feeds in your URL string and have not established a connection, so it could not possibly have succeeded as presented. But now it's time to admit I do not know why it is not succeeding when I correct those flaws.
closeAllConnections() data_url <-
url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv")
read.csv(data_url)
Error in open.connection(file, "rt") : cannot open the connection
closeAllConnections() dd <- read.csv(con <-
url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv"))
Error in open.connection(file, "rt") : cannot open the connection
So, I guess I'm not reading the help pages for `url` and `read.csv` as well I thought I was.
Any suggestion what might be causing this or how to solve it?
On Fri, Apr 29, 2011 at 06:19:24PM +0300, Tal Galili wrote:
data_url <- " http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv " read.csv(data_url) Error in file(file, "rt") : cannot open the connection
I get the same error (R 2.11.1, Debian LINUX) and don't have a solution. But I did some tests and found the origin of the problem I can download the file from google with wget but get some interesting ?information in the process: ------------------------------------------------------------ $ wget -v 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv' --2011-04-29 20:07:40-- http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv Resolving spreadsheets0.google.com... 209.85.148.139, 209.85.148.113, 209.85.148.138, ... Connecting to spreadsheets0.google.com|209.85.148.139|:80... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following] --2011-04-29 20:07:41-- https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv Connecting to spreadsheets0.google.com|209.85.148.139|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ?pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv.1? [ <=> ] 41 --.-K/s in 0s 2011-04-29 20:07:42 (342 KB/s) - ?pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv.1? saved [41] ------------------------------------------------------------ The message that caught my attention was the http redirection: "302 Moved Temporarily". If you try again with the new url you get this:
read.csv(url("https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&g"))
Error in open.connection(file, "rt") : cannot open the connection In addition: Warning message: In open.connection(file, "rt") : unsupported URL scheme ?url told me "Note that ?https://? connections are not supported." Case closed, problem unsolved... Dirty workaround: use system() and wget or whatever command is available on Windows for this. cu Philipp
Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110430/0e9f8064/attachment.pl>
Hi Tal
You can add
ssl.verifypeer = FALSE
in the .opts list so that the certificate is simply accepted.
Alternatively, you can tell libcurl where to find the certification
authority file containing signatures. This can be done via the cainfo
option, e.g.
cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"),
Often such a collection of certificates is installed with the ssl library.
D.
On 4/29/11 2:42 PM, Tal Galili wrote:
Hello Duncan, Thank you for having a look at this. I tried the code you provided but it failed in the getForm stage. running this:
> tt = getForm("http://spreadsheets0.google.com/spreadsheet/pub",
+ hl ="en", key = "0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE",
+ single = "true", gid ="0",
+ output = "csv",
+ .opts = list(followlocation = TRUE, verbose = TRUE))
Resulted in the following error:
Error in curlPerform(url = url, headerfunction = header$update, curl = curl, :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Did I miss some step?
----------------Contact Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com <mailto:Tal.Galili at gmail.com> | 972-52-7275845
Read me: www.talgalili.com <http://www.talgalili.com> (Hebrew) | www.biostatistics.co.il
<http://www.biostatistics.co.il> (Hebrew) | www.r-statistics.com <http://www.r-statistics.com> (English)
----------------------------------------------------------------------------------------------
On Fri, Apr 29, 2011 at 9:18 PM, Duncan Temple Lang <duncan at wald.ucdavis.edu <mailto:duncan at wald.ucdavis.edu>> wrote:
Thanks David for fixing the early issues.
The reason for the failure is that the response
from the Web server is a to redirect the requester
to another page, specifically
https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
<https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv>
Note that this is https, not http, and the built-in URL reading facilities in R don't suport https.
One way to see this is to use look at the headers in your browser (e.g. Live HTTP Headers),
or to use curl, or the RCurl package
tt = getForm("http://spreadsheets0.google.com/spreadsheet/pub",
hl ="en", key = "0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE",
single = "true", gid ="0",
output = "csv",
.opts = list(followlocation = TRUE, verbose = TRUE))
The verbose option shows the entire dialog, and tt contains the
text of the CSV document.
read.csv(textConnection(tt))
then yields the data frame
D.
On 4/29/11 10:36 AM, David Winsemius wrote:
>
> On Apr 29, 2011, at 11:19 AM, Tal Galili wrote:
>
>> Hello all,
>> I wish to use read.csv to read a google doc spreadsheet.
>>
>> I try using the following code:
>>
>> data_url <- "
>>
http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv <http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv>
>>
>> "
>> read.csv(data_url)
>>
>> Which results in the following error:
>>
>> Error in file(file, "rt") : cannot open the connection
>>
>>
>> I'm on windows 7. And the code was tried on R 2.12 and 2.13
>>
>> I remember trying this a few months ago and it worked fine.
>
> I am always amused at such claims. Occasionally they are correct, but more often a crucial step has been omitted. In
> this case you have at a minimum embedded line-feeds in your URL string and have not established a connection, so it
> could not possibly have succeeded as presented.
>
> But now it's time to admit I do not know why it is not succeeding when I correct those flaws.
>
>> closeAllConnections()
>> data_url <-
>
url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
<http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv>")
>
>> read.csv(data_url)
> Error in open.connection(file, "rt") : cannot open the connection
>
>> closeAllConnections()
>> dd <- read.csv(con <-
>
url("http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
<http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv>"))
>
> Error in open.connection(file, "rt") : cannot open the connection
>
>
> So, I guess I'm not reading the help pages for `url` and `read.csv` as well I thought I was.
>
>
>> Any suggestion what might be causing this or how to solve it?
>
>
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110430/c33e3a48/attachment.pl>