Skip to content

cut2 not binning interval endpoints correctly

6 messages · Maximilian Butler, S Ellison, jim holtman +1 more

#
FAQ 7.31

Sent from my iPad
On Nov 25, 2013, at 9:01, Maximilian Butler <maximilian.butler at gmail.com> wrote:

            
#
Maybe. But 

#and
0.308 == seq(0, 0.310, 0.001)[309]
# [1] TRUE

seems to suggest that while some oddities may be explained by finite precision, 0.308 is exactly represented by the cut sequence  here, so .308 should be OK.

#in addition, extending  the OP's example
df <- data.frame(x=c(0.308,0.422,0.174,0.04709))
df$bucket <- cut2(df$x,seq(0,1,0.001),oneval=FALSE)
df$cutR <- cut(df$x,seq(0,1,0.001),right=FALSE)
df

#         x        bucket          cutR
# 1 0.30800 [0.307,0.308) [0.308,0.309)
# 2 0.42200 [0.421,0.422) [0.422,0.423)
# 3 0.17400 [0.173,0.174) [0.173,0.174)
# 4 0.04709 [0.047,0.048) [0.047,0.048)

implies that cut2 is not doing the same thing as cut despite the same intended outcome (at least on R 3.0.1, my present version at work).

This may be one for Frank Harrell ...

S Ellison



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
1 day later
#
Um... I think I did. But I'm not sure you did.... 
print(..., digits=20) has used different numbers of digits for your two print()s, probably because print() decided it needed more digits for the multi-valued vector. The internal representations were the same. Try

print(seq(0, 0.310, 0.001)[309], digits = 20)
[1] 0.307999999999999996

print(seq(0, 0.310, 0.001)[309], digits = 22)
[1] 0.3079999999999999960032
[1] 0.3079999999999999960032

0.308 does match the cut boundary 'exactly' in this case (which is why the usually unwise '==' returned TRUE), though neither is exactly 0.308. 

Nonetheless, I understand that FAQ 7.31 is a good candidate for other 'unexpected' cut2 results. However, that isn't the whole story. It doesn't explain the corresponding cut(, right=FALSE) result, which should give the same answer as cut2 if finite representation were the sole cause. So there's summat else going on.


Steve E



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
#
You can look at the source code of Hmisc::cut2() to see what is going on -- it does
 a lot more than calling cut() with different default arguments.  Another
approach to debugging this is to use trace() to see what cut2() passes down
to the default cut method:
Tracing function "cut.default" in package "base"
[1] "cut.default"
Tracing cut.default(x, k2) on entry 
   x= 0.308 
   breaks= c(0.3045, 0.3055, 0.3065, 0.3075, 0.3085, 0.3095, 0.3105, 0.3115,  0.3125, 0.314)
[1] [0.308,0.309)
9 Levels: [0.305,0.306) [0.306,0.307) [0.307,0.308) ... [0.313,0.314]

I.e., this has little to do with floating point errors in cut(). 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com