Christoph Lehmann wrote:
entry from html:
<tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td>
0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr>
<tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td>
1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr>
using
left.data<- scan(paste(path, left.file, sep = ""), what = 'character',
sep=c("<td>", "</td>"))
yields
> left.data
[1] " " "tr bgcolor=#9090f0>" "td align=right>" [4] "b>BM" "/b>" "/td>" [7] "td> 0.952" "/td>" "td> 0.136" [10] "/td>" "td> 6.984" "/td>" [13] "td>0.000000" "/td>" "/tr>" [16] " " "tr bgcolor=#9090f0>" "td align=right>" [19] "b>BH" "/b>" "/td>" [22] "td> 1.338" "/td>" "td> 0.136" [25] "/td>" "td> 9.821" "/td>" [28] "td>0.000000" "/td>" "/tr>" why doesn't it detect the whole '<tr> as sep? Uwe Ligges wrote:
Christoph Lehmann wrote:
Hi I try to import html text and I need to split the fields at each <td> or </td> entry How can I succeed? sep = '<td>' doens't yield the right result
If it fits pairwise together, use
sep=c("<td>", "</td>")
Apologies, one should not send untested code. "sep" must be a character rather than a string containg more than one character. So you may want to try out my second suggestion. Uwe Ligges
if not, you can read the whole lot with readLines and strsplit for both pattern after that, for example. Uwe Ligges
thanks for hints
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html