how to read a web page and extract an html table?
On Tue, 6 May 2003 07:31:29 -0700 (PDT), you wrote in message <20030506143129.33487.qmail at web12105.mail.yahoo.com>:
I want to extract the table from the html file. Is there a function html2R, the opposite of R2html? How should I do this?
I don't think there is anything that does that, but the XML package (from CRAN) contains a function called htmlTreeParse should get you partway there. Duncan Murdoch
Or if you know (or can learn) perl here is a script that will do it (and output it as a csv). You need to edit $url and @tableheaders and to install WWW::Mechanize and HTML::TableExtract from cpan. http://cpan.org #!/usr/bin/perl use HTML::TableExtract; use WWW::Mechanize; my $url = "http://shangorilla.syr.edu/testR.html"; my @tableheaders = qw (Firstcol Secondcol Thirdcol); my $agent = WWW::Mechanize->new(); $agent->get($url); # Output headers print join(',', at tableheaders), "\n"; # Find table in html page $te = new HTML::TableExtract( headers => \@tableheaders ); $te->parse( $agent->content() ); #parse contents # Examine all matching tables (there is only be one?) foreach $ts ($te->table_states) { foreach $row ($ts->rows) { print join(',', @$row), "\n"; } } (copy into editor and save as testRtable.pl then chmod u+x testRtable.pl) run as ./testRtable.pl to check content then ./testRtable.pl > csvforReadingIntoR.txt Then in R > data <- read.csv("csvforReadingIntoR.txt") I think that should work for you. (or just send me the url and I'll run it and mail you back the csv - if this is a one off.) Speaking of perl - Does anyone know if there is a standard way to use perl scripts from within R - I guess one can call them as one does from the commandline. Is it possible to program R modules in perl (or would the cpan dependancies kill us?) If this is a one off (ie not for scripting) then I think you can directly select a table in IE and paste it into Excel - then save as csv to read into R. Cheers James
On Tuesday, May 6, 2003, at 11:17 AM, Duncan Murdoch wrote:
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help