Skip to content
Prev 36883 / 63421 Next

Argument recycling in substring()

I think it would be nice if multiargument vectorized
functions in core R used the rules that are used by
the arithmetic functions (`+`, etc.):
   a) if any argument length is 0, then the output
      length is 0
   b) otherwise the output is the length of the longest
      input
The arithmetic functions also warn if the output length
is not a multiple of some input length.   (They actually
warn 'longer ... length is not a multiple of shorter ...'
and I'm extrapolating that to more than two arguments.)
Most other multi-vectorized functions (e.g., log, pnorm)
don't currently warn.

If they all followed the same rules then it would be easier
to write code involving unfamiliar functions.  The rule
could be stated in one help file and a help file for a
given function could say that arguments x, y, and z,
but not a or b, are 'vectorized', with a link to the one
help file describing vectorization.  Even better, the C
and C++ API's could be expanded to do the standard
multivectorization so not every function would do it
in its own way.

Some functions cannot be changed to follow that rule because
it would break too much code (e.g., paste() and cat()).
However, why shouldn't substring return character(0) if
any argument is 0 long?

By the way, the 'zero rule' is there so we don't have to
write so many if(length(x)>0) statements around things like
    which(x) + 1
or
    substring(x, 1, nchar(x)-1)
where the scalar 1 would otherwise cause NA's to arise.

[Perhaps I should not state my opinion so forcibly, since.
for legal reasons, I'm not in a position to change core R code.]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com