Skip to content

how to read a web page and extract an html table?

5 messages · Adi Humbert, James Howison, Duncan Murdoch +1 more

#
Hello all, 

I want to read a table from a given web page. 

If I do something like
aux2 contains the html file. 

I want to extract the table from the html file. 
Is there a function html2R, the opposite of R2html? 
How should I do this? 

Thanks, 
Adrian
#
On Tue, 6 May 2003 07:31:29 -0700 (PDT), you wrote in message
<20030506143129.33487.qmail at web12105.mail.yahoo.com>:
I don't think there is anything that does that, but the XML package
(from CRAN) contains a function called htmlTreeParse should get you
partway there.

Duncan Murdoch
#
Or if you know (or can learn) perl here is a script that will do it 
(and output it as a csv).  You need to edit $url and @tableheaders and 
to install WWW::Mechanize and HTML::TableExtract from cpan.  
http://cpan.org

#!/usr/bin/perl

use HTML::TableExtract;
use WWW::Mechanize;

my $url = "http://shangorilla.syr.edu/testR.html";
my @tableheaders = qw (Firstcol Secondcol Thirdcol);

my $agent = WWW::Mechanize->new();
$agent->get($url);

# Output headers
print join(',', at tableheaders), "\n";

# Find table in html page
$te = new HTML::TableExtract( headers => \@tableheaders );
$te->parse( $agent->content() ); #parse contents

# Examine all matching tables (there is only be one?)
foreach $ts ($te->table_states) {
     foreach $row ($ts->rows) {
         print join(',', @$row), "\n";
             }
}

(copy into editor and save as testRtable.pl then chmod u+x 
testRtable.pl)
run as ./testRtable.pl to check content
then
./testRtable.pl > csvforReadingIntoR.txt

Then in R

 > data <- read.csv("csvforReadingIntoR.txt")

I think that should work for you. (or just send me the url and I'll run 
it and mail you back the csv - if this is a one off.)

Speaking of perl - Does anyone know if there is a standard way to use 
perl scripts from within R - I guess one can call them as one does from 
the commandline.  Is it possible to program R modules in perl (or would 
the cpan dependancies kill us?)

If this is a one off (ie not for scripting) then I think you can 
directly select a table in IE and paste it into Excel - then save as 
csv to read into R.

Cheers
James
On Tuesday, May 6, 2003, at 11:17 AM, Duncan Murdoch wrote:

        
#
On Tue, 6 May 2003 12:01:42 -0400, you wrote in message
<07AC3439-7FDC-11D7-90EC-00306579408C at syr.edu>:
I don't know the answers to those questions, but wanted to point out
that most users of R on Windows won't have Perl installed, so if you
do this, you should at least give them instructions on where to find
it (like those in the readme.packages, "available via
http://www.activestate.com/Products/ActivePerl/.").

Duncan Murdoch
#
Hi!
On 06-May-2003 Adi Humbert wrote:
I think the easiest way is using perl as preprocessor:
http://www.devshed.com/Server_Side/Perl/DataMining/page3.html


hope this helps,
dst
"There is no way to peace, peace is the way." -- Ghandi

Detlef Steuer --- http://fawn.unibw-hamburg.de/steuer.html
***** Encrypted mail preferred *****