extracting tables from web pages?
On 4/25/2013 1:19 PM, Dirk Eddelbuettel wrote:
On 25 April 2013 at 13:00, Spencer Graves wrote: | Hello: | | | What tools would you recommend for extracting the table of | members of the US House of representatives from | "http://house.gov/representatives/" and | "http://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives_by_age"? | | | | I started writing something using getURL{RCurl}. However, I'm | getting bogged down manually selecting character sequences to search for | and split on. You could try your own sos package to search what others have done here; the XML package is popular for it but the whole scheme is fraught with little pitfalls as html very definitely is not a good format for data-delivery, and an html page clearly is no API for data access.
Thanks to Gabriel Becker and Dirk Eddelbuettel for suggesting
XML: Its "readHTMLTable" solves my problem.
I confess that I tried "sos" before posting to this list without
getting useful results: The search terms I tried returned too many
matches to be useful.
And Gabriel was correct in that I should have sent the question
to R-Help, but I only concluded that after sending it here.
Thanks again.
Spencer
Dirk