Skip to content
Prev 60871 / 63424 Next

as.Date (and strptime?) does not recognize "  " as a blank

Depends  a bit on what you mean by "automatically". This seems to work for
me (note this has NOT been extensively tested on different OSes or even in
different locales/encodings):

library(XML)
myhtml <- "<html><body><table
id='hiya'><tr><th>colname</th></tr><tr><td>&nbsp;</td></tr><tr><td>
</td></tr></table></body></html>"
doc <- htmlParse(myhtml, asText = TRUE)
oldway <- readHTMLTable(doc, trim = FALSE)

identical(oldway$hiya$colname[1], oldway$hiya$colname[2]) # FALSE :(

decode_nbsp <- function(x) gsub(rawToChar(as.raw(c(0xc2, 0xa0))), " ", x,
fixed = TRUE, useBytes = TRUE)
fancypants <- function(node) decode_nbsp(xmlValue(node))
newandfancy <- readHTMLTable(doc, trim = FALSE, elFun = fancypants)

identical(newandfancy$hiya$colname[1], newandfancy$hiya$colname[2]) # TRUE
:D

Best,
~G

On Fri, Jun 24, 2022 at 11:48 PM Spencer Graves <spencer.graves at prodsyse.com>
wrote:

  
  
Message-ID: <CAD4oTHFEjdms=scxCRsbOXtPLoWBOwCvm=KiF6Q_9hW62nyk5g@mail.gmail.com>
In-Reply-To: <b7458796-4406-7da9-c893-db33f819b0a9@prodsyse.com>