Argument recycling in substring()
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Martin Maechler Sent: Friday, June 04, 2010 2:46 AM To: Herv? Pag?s Cc: r-devel at stat.math.ethz.ch Subject: Re: [Rd] Argument recycling in substring()
"HP" == Herv? Pag?s <hpages at fhcrc.org>
on Thu, 03 Jun 2010 17:53:33 -0700 writes:
HP> Hi,
HP> According to its man page substring() "expands (its) arguments
HP> cyclically to the length of the longest _provided_ none are of
HP> zero length".
HP> So, as expected, I get an error here:
>> substring("abcd", first=2L, last=integer(0))
HP> Error in substring("abcd", first = 2L, last = integer(0)) :
HP> invalid substring argument(s)
HP> But I don't get one here:
>> substring(character(0), first=1:2, last=3L)
HP> character(0)
HP> which is unexpected. according to the docu.
My gut feeling would say that the documentation should be
updated in this case, rather than the implementation.
RFC! other opinions?
I think it would be nice if multiargument vectorized
functions in core R used the rules that are used by
the arithmetic functions (`+`, etc.):
a) if any argument length is 0, then the output
length is 0
b) otherwise the output is the length of the longest
input
The arithmetic functions also warn if the output length
is not a multiple of some input length. (They actually
warn 'longer ... length is not a multiple of shorter ...'
and I'm extrapolating that to more than two arguments.)
Most other multi-vectorized functions (e.g., log, pnorm)
don't currently warn.
If they all followed the same rules then it would be easier
to write code involving unfamiliar functions. The rule
could be stated in one help file and a help file for a
given function could say that arguments x, y, and z,
but not a or b, are 'vectorized', with a link to the one
help file describing vectorization. Even better, the C
and C++ API's could be expanded to do the standard
multivectorization so not every function would do it
in its own way.
Some functions cannot be changed to follow that rule because
it would break too much code (e.g., paste() and cat()).
However, why shouldn't substring return character(0) if
any argument is 0 long?
By the way, the 'zero rule' is there so we don't have to
write so many if(length(x)>0) statements around things like
which(x) + 1
or
substring(x, 1, nchar(x)-1)
where the scalar 1 would otherwise cause NA's to arise.
[Perhaps I should not state my opinion so forcibly, since.
for legal reasons, I'm not in a position to change core R code.]
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
HP> Otherwise, yes substring() will recycle its arguments to the
HP> length of the longest:
>> substring("abcd", first=1:3, last=4:3)
HP> [1] "abcd" "bc" "cd"
HP> Cheers,
HP> H.
HP> --
HP> Herv? Pag?s
HP> Program in Computational Biology
HP> Division of Public Health Sciences
HP> Fred Hutchinson Cancer Research Center
HP> 1100 Fairview Ave. N, M2-B876
HP> P.O. Box 19024
HP> Seattle, WA 98109-1024
HP> E-mail: hpages at fhcrc.org
HP> Phone: (206) 667-5791
HP> Fax: (206) 667-1319
HP> ______________________________________________
HP> R-devel at r-project.org mailing list
HP> https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel