sum() vs cumsum() implicit type coercion
(If I may be so bold, although I think it's unlikely that a majority would be in favour of this change, and I doubt anyone is actually proposing it, I think quite a bit more than "a majority" should be required before a change like this should be allowed. Considering the feature that cumsum coerces to numeric is documented, the consistency of type coercion between sum and cumsum has never been advertised, and that a custom version of cumsum that addresses the inconsistency would be very easy for users to create themselves, I'd struggle to think the change could ever have merit. Even public unanimity would probably not be enough.) On Tue, 25 Aug 2020 at 20:25, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
Tomas Kalibera
on Tue, 25 Aug 2020 09:29:05 +0200 writes:
> On 8/23/20 5:02 PM, Rory Winston wrote:
>> Hi
>>
>> I noticed a small inconsistency when using sum() vs cumsum()
>>
>> I have a char-based series
>>
>> > tryjpy$long
>>
>> [1] "0.0022" "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"
>>
>> [8] "0.0003" "-0.0001" "-0.0004" "-0.0036" "-0.001" "-0.0011" "-0.0012"
>>
>> [15] "-0.0006" "0.0016" "0.0006"
>>
>> When I run sum() vs cumsum() , sum fails but cumsum converts the
>> series to numeric before summing:
>>
>>> sum(tryjpy$long)
>> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
>>
>>> cumsum(tryjpy$long)
>> [1] 0.0022 0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
>> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
>>
>> Which I guess is due to the following line in do_cum():
>>
>> PROTECT(t = coerceVector(CAR(args), REALSXP));
>> This might be fine and there may be very good reasons why there is no
>> coercion in sum - just seems a little inconsistent in usage
> Yes. I don't know the reason for this design, but please note it is
> documented in ?sum and in ?cumsum, which would also make it harder to
> change. One can always use a consistent subset (not rely on the coercion
> e.g. from characters).
> Best
> Tomas
Indeed. Further note that most arithmetic/math *fails* on character vectors, so if a change would have to be made, it should rather be such that cumsum() also rejects character input. We would have consistency then, but potentially break user code, even package code which has hitherto assumed cumsum() to coerce to numeric first. If a majority of commentators and R core thinks we should make such a change, I'd agree to consider it. Otherwise, we save (ourselves and others) a bit of time. Martin
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel