Skip to content

Argument recycling in substring()

6 messages · Hervé Pagès, Martin Maechler, Deepayan Sarkar +3 more

#
Hi,

According to its man page substring() "expands (its) arguments
cyclically to the length of the longest _provided_ none are of
zero length".

So, as expected, I get an error here:

   > substring("abcd", first=2L, last=integer(0))
   Error in substring("abcd", first = 2L, last = integer(0)) :
     invalid substring argument(s)

But I don't get one here:

   > substring(character(0), first=1:2, last=3L)
   character(0)

which is unexpected.

Otherwise, yes substring() will recycle its arguments to the
length of the longest:

   > substring("abcd", first=1:3, last=4:3)
   [1] "abcd" "bc"   "cd"

Cheers,
H.
#
HP> Hi,
    HP> According to its man page substring() "expands (its) arguments
    HP> cyclically to the length of the longest _provided_ none are of
    HP> zero length".

    HP> So, as expected, I get an error here:

    >> substring("abcd", first=2L, last=integer(0))
    HP> Error in substring("abcd", first = 2L, last = integer(0)) :
    HP> invalid substring argument(s)

    HP> But I don't get one here:

    >> substring(character(0), first=1:2, last=3L)
    HP> character(0)

    HP> which is unexpected.
according to the docu.

My gut feeling would say that the documentation should be
updated in this case, rather than the implementation.

RFC! other opinions?


    HP> Otherwise, yes substring() will recycle its arguments to the
    HP> length of the longest:

    >> substring("abcd", first=1:3, last=4:3)
    HP> [1] "abcd" "bc"   "cd"




    HP> Cheers,
    HP> H.

    HP> -- 
    HP> Herv? Pag?s

    HP> Program in Computational Biology
    HP> Division of Public Health Sciences
    HP> Fred Hutchinson Cancer Research Center
    HP> 1100 Fairview Ave. N, M2-B876
    HP> P.O. Box 19024
    HP> Seattle, WA 98109-1024

    HP> E-mail: hpages at fhcrc.org
    HP> Phone:  (206) 667-5791
    HP> Fax:    (206) 667-1319

    HP> ______________________________________________
    HP> R-devel at r-project.org mailing list
    HP> https://stat.ethz.ch/mailman/listinfo/r-devel
#
2010/6/4 Martin Maechler <maechler at stat.math.ethz.ch>:
I agree. The current behaviour is reasonable.

-Deepayan
#
On Jun 4, 2010, at 12:10 PM, Deepayan Sarkar wrote:

            
Yes, but I don't see how it is inconsistent with the docs. It says that it won't recycle, and it doesn't. The fact that the combination of 0-length index and a positive-length x is nonsensical is an orthogonal issue. (Notice, BTW, that
x <- character(0); i <- integer(0); substr(x,i,i) does NOT give an error.)

Of course it is never harmful to be explicit about things....

  
    
#
If you want a set of string functions that strive to be simple and
consistent you might want to look at the stringr package.  And since
it's a new package, I'm very keen to remove any inconsistencies.

Hadley
#
I think it would be nice if multiargument vectorized
functions in core R used the rules that are used by
the arithmetic functions (`+`, etc.):
   a) if any argument length is 0, then the output
      length is 0
   b) otherwise the output is the length of the longest
      input
The arithmetic functions also warn if the output length
is not a multiple of some input length.   (They actually
warn 'longer ... length is not a multiple of shorter ...'
and I'm extrapolating that to more than two arguments.)
Most other multi-vectorized functions (e.g., log, pnorm)
don't currently warn.

If they all followed the same rules then it would be easier
to write code involving unfamiliar functions.  The rule
could be stated in one help file and a help file for a
given function could say that arguments x, y, and z,
but not a or b, are 'vectorized', with a link to the one
help file describing vectorization.  Even better, the C
and C++ API's could be expanded to do the standard
multivectorization so not every function would do it
in its own way.

Some functions cannot be changed to follow that rule because
it would break too much code (e.g., paste() and cat()).
However, why shouldn't substring return character(0) if
any argument is 0 long?

By the way, the 'zero rule' is there so we don't have to
write so many if(length(x)>0) statements around things like
    which(x) + 1
or
    substring(x, 1, nchar(x)-1)
where the scalar 1 would otherwise cause NA's to arise.

[Perhaps I should not state my opinion so forcibly, since.
for legal reasons, I'm not in a position to change core R code.]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com