Skip to content

readHTMLTable (XML package)

7 messages · Lopez, Dan, David Winsemius, Ista Zahn

#
Hi Dan,

A couple of things: first, I think that file really does not exist (at
least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of
?url, which points you to RCurl. So, once you verify that you url
actually exists you can do something like

library(XML)
library(RCurl)
tabs <- readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_by_population"))

Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
#
Hi Ista,

It does exist. It?s a page in our company intranet.

It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error.

Do you have experience with pulling a table of an https site? If so how do I do that?
Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) : 
  error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE)  : 
  SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed


Thanks.
Dan

-----Original Message-----
From: Ista Zahn [mailto:istazahn at gmail.com] 
Sent: Tuesday, January 15, 2013 12:22 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)

Hi Dan,

A couple of things: first, I think that file really does not exist (at least I can't open it in my web browser). Second, even if it did,
url() cannot download from https, according to the details section of ?url, which points you to RCurl. So, once you verify that you url actually exists you can do something like

library(XML)
library(RCurl)
tabs <- readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_by_population"))

Best,
Ista
On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
#
On Jan 15, 2013, at 2:31 PM, Lopez, Dan wrote:

            
Why not use a browser and save it locally?
#
David,

Because there is some data on various webpages that I use periodically that this would be convenient for me to use.
Copying and pasting is messy . And obtaining direct database access for the data on some these pages is not possible for me (i.e. won't get approved...but I can use what is out there)

Dan


-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Tuesday, January 15, 2013 3:00 PM
To: Lopez, Dan
Cc: Ista Zahn; R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)
On Jan 15, 2013, at 2:31 PM, Lopez, Dan wrote:

            
Why not use a browser and save it locally?

--
David.
David Winsemius
Alameda, CA, USA
#
Hi Dan,
On Tue, Jan 15, 2013 at 5:31 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Ah, good.
Well that error is not because RCurl doesn't work with https protocol.
In my original example I meant to show

tabs <- readHTMLTable(getURL("https://en.wikipedia.org/wiki/List_of_countries_by_population"))

i.e., getURL() does work with https. (Well, maybe depending on your
version of libcurl. See the getURL help page for details.)
Yes, I do :)
See below
This is an RCurl FAQ (see http://www.omegahat.org/RCurl/FAQ.html). The
quick and dirty way is

getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html",
ssl.verifypeer = FALSE)

Best,
Ista
#
Ista,

Thank you. That more or less did the trick. I got the data though it's in a weird format compared to how it appears on the page and needs a lot of clean up. But I was kind of expecting that.
Dan


-----Original Message-----
From: Ista Zahn [mailto:istazahn at gmail.com] 
Sent: Tuesday, January 15, 2013 3:18 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)

Hi Dan,
On Tue, Jan 15, 2013 at 5:31 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Ah, good.
Well that error is not because RCurl doesn't work with https protocol.
In my original example I meant to show

tabs <- readHTMLTable(getURL("https://en.wikipedia.org/wiki/List_of_countries_by_population"))

i.e., getURL() does work with https. (Well, maybe depending on your version of libcurl. See the getURL help page for details.)
Yes, I do :)
See below
This is an RCurl FAQ (see http://www.omegahat.org/RCurl/FAQ.html). The quick and dirty way is

getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html",
ssl.verifypeer = FALSE)

Best,
Ista