Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my Win XP. See sessionInfo from below. Mac R:
sessionInfo()
R version 2.9.2 (2009-08-24) i386-apple-darwin8.11.1 locale: fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_2.6-0
WinXP:
sessionInfo()
R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_2.6-0 RCurl_1.2-1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.9.2
-L 2009/12/31 Eduardo Leoni <leoniedu at msu.edu>:
In the meantime, try this. library(XML) theurl <- "http://www.aarresaari.net/jobboard/jobs.html" download.file(theurl, "tmp.html") txt <- readLines("tmp.html") txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) head(grep(" ", g, value=T)) It works for me: