Message-ID: <24269.6259.934079.331347@stat.math.ethz.ch>
Date: 2020-05-26T13:24:03Z
From: Martin Maechler
Subject: paste(character(0), collapse="", recycle0=FALSE) should be ""
In-Reply-To: <f39343f8-bca9-5204-e2a9-fb4a0f84a77b@fredhutch.org>
>>>>> Herv? Pag?s
>>>>> on Sun, 24 May 2020 14:22:37 -0700 writes:
> On 5/24/20 00:26, Gabriel Becker wrote:
>>
>>
>> On Sat, May 23, 2020 at 9:59 PM Herv? Pag?s <hpages at fredhutch.org
>> <mailto:hpages at fredhutch.org>> wrote:
>>
>> On 5/23/20 17:45, Gabriel Becker wrote:
>> > Maybe my intuition is just
>> > different?but when I collapse multiple character vectors together, I
>> > expect?all the characters from each of those vectors to be in the
>> > resulting collapsed one.
>>
>> Yes I'd expect that too. But the **collapse** operation in paste() has
>> never been about collapsing **multiple** character vectors together.
>> What it does is collapse the **single** character vector that comes out
>> of the 'sep' operation.
>>
>>
>> I understand what it does, I broke ti down the?same way in my post
>> earlier in?the thread. the fact remains?is that it is a single function
>> which significantly muddies the waters. so you can say
>>
>> paste0(x,y, collapse=",", recycle0=TRUE)
>>
>> is not a collapse operation on multiple?vectors, and of course there's a
>> sense in which?you're not wrong (again I understand what these functions
>> do), but it sure looks like one in the invocation, doesn't it?
>>
>> Honestly the thing that this whole discussion has shown me most clearly
>> is that, imho, collapse (accepting ONLY one data vector) and
>> paste(accepting multiple) should never have been a single function to
>> begin with.? But that ship sailed long long ago.
> Yes :-(
>>
>> So
>>
>> ? ?paste(x, y, z, sep="", collapse=",")
>>
>> is analogous to
>>
>> ? ?sum(x + y + z)
>>
>>
>> Honestly, I'd be significantly more comfortable?if
>>
>> 1:10?+ integer(0)?+ 5
>>
>> were an error too.
> This is actually the recycling scheme used by mapply():
>> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5)
> Error in mapply(FUN = FUN, ...) :
> zero-length inputs cannot be mixed with those of non-zero length
> AFAIK base R uses 3 different recycling schemes for n-ary operations:
> (1) The recycling scheme used by arithmetic and comparison operations
> (Arith, Compare, Logic group generics).
> (2) The recycling scheme used by classic paste().
> (3) The recycling scheme used by mapply().
> Having such a core mechanism like recycling being inconsistent across
> base R is sad. It makes it really hard to predict how a given n-ary
> function will recycle its arguments unless you spend some time trying it
> yourself with several combinations of vector lengths. It is of course
> the source of numerous latent bugs. I wish there was only one but that's
> just a dream.
> None of these 3 recycling schemes is perfect. IMO (2) is by far the
> worst. (3) is too restrictive and would need to be refined if we wanted
> to make it a good universal recycling scheme.
> Anyway I don't think it makes sense to introduce a 4th recycling scheme
> at this point even though it would be a nice item to put on the wish
> list for R 7.0.0 with the ultimate goal that it will universally adopted
> in R 11.0.0 ;-)
> So if we have to do with what we have IMO (1) is the scheme that makes
> most sense although I agree that it can do some surprising things for
> some unusual combinations of vector lengths. It's the scheme I adhere to
> in my own binary operations e.g. in S4Vector::pcompare().
> The modest proposal of the 'recycle0' argument is only to let the user
> switch from recycling scheme (2) to (1) if they're not happy with scheme
> (2) (I'm one of them).
Yes, indeed. This was the purpose of introducing 'recycle0'.
Now, with collapse = <string>, {in R "string" := character vector of length 1}.
we clearly see different interpretations on what is desirable
for recycle0 = TRUE,
all of you (Suharto, Bill, Herv?, Gabe) assert that the behavior
should be different than now, and should either error (possibly,
by Gabe), or return a single string (possibly with a warning),
i.e., collapse = <string> behavior should not be influenced (or
possibly be conflicting with) by recycle0=TRUE.
Within R core, some believe the current recyle0=TRUE behavior to
be the correct one. Personally, I see
reasons for both..
What about remaining back-compatible, not only to R 3.y.z with
default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE
*and* add a new option for the Suharto-Bill-Herv?-Gabe behavior,
e.g., recycle0="sep.only" or just recycle0="sep" ?
As (for back-compatibility reasons) you have to specify
'recycle0 = ..' anyway, you would get what makes most sense to
you by using such a third option.
? (WDYT ?)
Martin
> Switching to scheme (3) or to a new custom scheme
> would be a completely different proposal.
>>
>> At least I'm consistent right?
> Yes :-)
> Anyway discussing recycling schemes is interesting but not directly
> related with what the OP brought up (behavior of the 'collapse' operation).
> Cheers,
> H.
>>
>> ~G