Skip to content

scan html: sep = "<td>"

3 messages · Christoph Lehmann, Uwe Ligges, Eric Lecoutre

#
Hi
I try to import html text and I need to split the fields at each <td> or 
</td> entry

How can I succeed? sep = '<td>' doens't yield the right result

thanks for hints
#
Christoph Lehmann wrote:

            
If it fits pairwise together, use
   sep=c("<td>", "</td>")

if not, you can read the whole lot with readLines and strsplit for both 
pattern after that, for example.

Uwe Ligges
#
You can import the whole thing and use on it "strsplit"

?strsplit

Eric

Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
lecoutre at stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward
Tufte