An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130115/b7a57e8c/attachment.pl>
readHTMLTable (XML package)
7 messages · Lopez, Dan, David Winsemius, Ista Zahn
Hi Dan,
A couple of things: first, I think that file really does not exist (at
least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of
?url, which points you to RCurl. So, once you verify that you url
actually exists you can do something like
library(XML)
library(RCurl)
tabs <- readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_by_population"))
Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
library(XML)
wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1)
Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Ista, It does exist. It?s a page in our company intranet. It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error. Do you have experience with pulling a table of an https site? If so how do I do that?
tabs <- readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html"))
Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) :
error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Thanks.
Dan
-----Original Message-----
From: Ista Zahn [mailto:istazahn at gmail.com]
Sent: Tuesday, January 15, 2013 12:22 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)
Hi Dan,
A couple of things: first, I think that file really does not exist (at least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of ?url, which points you to RCurl. So, once you verify that you url actually exists you can do something like
library(XML)
library(RCurl)
tabs <- readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_by_population"))
Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
library(XML)
wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi
_pop.html',1)
Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Jan 15, 2013, at 2:31 PM, Lopez, Dan wrote:
Hi Ista, It does exist. It?s a page in our company intranet. It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error. Do you have experience with pulling a table of an https site? If so how do I do that?
Why not use a browser and save it locally?
David.
>
>
>> tabs <- readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html"))
> Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) :
> error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE) :
> SSL certificate problem, verify that the CA cert is OK. Details:
> error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
>
>
> Thanks.
> Dan
>
> -----Original Message-----
> From: Ista Zahn [mailto:istazahn at gmail.com]
> Sent: Tuesday, January 15, 2013 12:22 PM
> To: Lopez, Dan
> Cc: R help (r-help at r-project.org)
> Subject: Re: [R] readHTMLTable (XML package)
>
> Hi Dan,
>
> A couple of things: first, I think that file really does not exist (at least I can't open it in my web browser). Second, even if it did,
> url() cannot download from https, according to the details section of ?url, which points you to RCurl. So, once you verify that you url actually exists you can do something like
>
> library(XML)
> library(RCurl)
> tabs <- readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_by_population"))
>
> Best,
> Ista
>
> On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
>> Hi,
>>
>> I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
>>
>>> library(XML)
>>> wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi
>>> _pop.html',1)
>> Error in htmlParse(doc) :
>> File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not
>> exist
>>
>> Dan
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
David, Because there is some data on various webpages that I use periodically that this would be convenient for me to use. Copying and pasting is messy . And obtaining direct database access for the data on some these pages is not possible for me (i.e. won't get approved...but I can use what is out there) Dan -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Tuesday, January 15, 2013 3:00 PM To: Lopez, Dan Cc: Ista Zahn; R help (r-help at r-project.org) Subject: Re: [R] readHTMLTable (XML package)
On Jan 15, 2013, at 2:31 PM, Lopez, Dan wrote:
Hi Ista, It does exist. It's a page in our company intranet. It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error. Do you have experience with pulling a table of an https site? If so how do I do that?
Why not use a browser and save it locally? -- David.
tabs <-
readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_p
op.html"))
Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) :
error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate
verify failed
Thanks.
Dan
-----Original Message-----
From: Ista Zahn [mailto:istazahn at gmail.com]
Sent: Tuesday, January 15, 2013 12:22 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)
Hi Dan,
A couple of things: first, I think that file really does not exist (at
least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of
?url, which points you to RCurl. So, once you verify that you url
actually exists you can do something like
library(XML)
library(RCurl)
tabs <-
readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_b
y_population"))
Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
library(XML)
wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_p
i
_pop.html',1)
Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
Hi Dan,
On Tue, Jan 15, 2013 at 5:31 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi Ista, It does exist. It?s a page in our company intranet.
Ah, good.
It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error.
Well that error is not because RCurl doesn't work with https protocol.
In my original example I meant to show
tabs <- readHTMLTable(getURL("https://en.wikipedia.org/wiki/List_of_countries_by_population"))
i.e., getURL() does work with https. (Well, maybe depending on your
version of libcurl. See the getURL help page for details.)
Do you have experience with pulling a table of an https site?
Yes, I do :)
If so how do I do that?
See below
tabs <- readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html"))
Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) :
error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
This is an RCurl FAQ (see http://www.omegahat.org/RCurl/FAQ.html). The quick and dirty way is getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html", ssl.verifypeer = FALSE) Best, Ista
Thanks.
Dan
-----Original Message-----
From: Ista Zahn [mailto:istazahn at gmail.com]
Sent: Tuesday, January 15, 2013 12:22 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)
Hi Dan,
A couple of things: first, I think that file really does not exist (at least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of ?url, which points you to RCurl. So, once you verify that you url actually exists you can do something like
library(XML)
library(RCurl)
tabs <- readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_by_population"))
Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
library(XML)
wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi
_pop.html',1)
Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ista, Thank you. That more or less did the trick. I got the data though it's in a weird format compared to how it appears on the page and needs a lot of clean up. But I was kind of expecting that. Dan -----Original Message----- From: Ista Zahn [mailto:istazahn at gmail.com] Sent: Tuesday, January 15, 2013 3:18 PM To: Lopez, Dan Cc: R help (r-help at r-project.org) Subject: Re: [R] readHTMLTable (XML package) Hi Dan,
On Tue, Jan 15, 2013 at 5:31 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi Ista, It does exist. It?s a page in our company intranet.
Ah, good.
It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error.
Well that error is not because RCurl doesn't work with https protocol.
In my original example I meant to show
tabs <- readHTMLTable(getURL("https://en.wikipedia.org/wiki/List_of_countries_by_population"))
i.e., getURL() does work with https. (Well, maybe depending on your version of libcurl. See the getURL help page for details.)
Do you have experience with pulling a table of an https site?
Yes, I do :)
If so how do I do that?
See below
tabs <-
readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_p
op.html"))
Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) :
error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate
verify failed
This is an RCurl FAQ (see http://www.omegahat.org/RCurl/FAQ.html). The quick and dirty way is getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html", ssl.verifypeer = FALSE) Best, Ista
Thanks.
Dan
-----Original Message-----
From: Ista Zahn [mailto:istazahn at gmail.com]
Sent: Tuesday, January 15, 2013 12:22 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)
Hi Dan,
A couple of things: first, I think that file really does not exist (at
least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of
?url, which points you to RCurl. So, once you verify that you url
actually exists you can do something like
library(XML)
library(RCurl)
tabs <-
readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_b
y_population"))
Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
library(XML)
wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_p
i
_pop.html',1)
Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.