Skip to content
Prev 60869 / 63421 Next

as.Date (and strptime?) does not recognize "  " as a blank

There is some misunderstanding here.  The space is part of the format 
specified by SG to as.Date(), which passes it to strptime(). So SG asked 
to match a space and complained that a different character is not matched!

Reading the documentation of strptime shows

      ?%n? Newline on output, arbitrary whitespace on input.
      ?%t? Tab on output, arbitrary whitespace on input.

so one might hope that one could use those to specify whitespace instead 
of ASCII space in the format.  But unfortunately whether a Unicode 
no-break space (U+00A0) is whitespace is a matter of opinion -- for 
example the PCRE author changed his a few years back.

We don't have a reproducible example, but my attempt at reproduction 
suggests that U+00A0 is not regarded as whitespace on the system I used. 
  We know this to be platform-specific (it uses the C function 
iswspace): glibc does not regard this as whitespace and the replacement 
functions used by R on macOS and Windows have followed suit.

In short, ASCII space matches only itself, and the interpretation of 
'blank' (in regexps) or 'whitespace' (in strptime or regexps) is 
platform-specific and liable to change.
On 25/06/2022 14:13, Spencer Graves wrote: