how to read a web page and extract an html table?
Adrian,
I want to extract the table from the html file. Is there a function html2R, the opposite of R2html? How should I do this?
Parsing arbitrary HTML is generally a nontrivial task. I would recommend using something like Perl to convert the HTML to delimited ASCII, and then use read.table() for example. There are specific modules in Perl (for example) that can help with the "HTML-2-ASCII" step, if not do it entirely. I have never used one myself, but I am sure CPAN can be searched for one. Hope that helps, Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis at merck.com Phone: 732 594 3913 Fax: 732 594 1565
-----Original Message----- From: Adi Humbert [mailto:adrian_humbert at yahoo.com] Sent: Tuesday, May 06, 2003 10:31 AM To: r-help at stat.math.ethz.ch Cc: adrian_humbert at yahoo.com Subject: [R] how to read a web page and extract an html table? Hello all, I want to read a table from a given web page. If I do something like
str="http://www...." # this is the web address aux1 <- url(str,open="rt")# open connection aux2 <- readLines(aux1) # read web page
aux2 contains the html file. I want to extract the table from the html file. Is there a function html2R, the opposite of R2html? How should I do this? Thanks, Adrian
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help