ifelse problem - bug or operator error
On Fri, Aug 24, 2012 at 7:29 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
AAAAAHHHHHHH I GOT IT!!!!!!!!!! And I *think* I understand about floating point arithmetic..
Well then you're doing much better than the rest of us: it's quite a difficult subject and only gets trickier as you think about it more. (Numerical analysis generally, not the definition of an IEEE754 / ISO 6059 double) You even get such fun as -1 * 0 != 1 * 0. under some interpretations.
In this case vn$PM.DIST.TOT is the sum of proportions. So, it should be anywhere 0 and 1. In our case, if it's anything other than 1 when vn$PM.EXP is greater than 0 then it means something is wrong with one of the variables used to sum vn$PM.DIST.TOT. I was worried making it an integer will cause cases of 0.4 to be 0 and look legal, when it's not (though it doesn't actually seem to be a problem). So, I just did what Michael and Peter suggested, after reading up on floating points. fpf <- 1e-05 # fpf = floating point fuzz
Though I sugested 1e-05 here, usually one uses slightly more stringent testing: a general rule of thumb is the square root of machine precision. In R terms, sqrt(.Machine$double.eps)
vn$PM.DIST_flag<-ifelse(vn$PM.EXP > 0 & abs(vn$PM.DIST.TOT - 1) > fpf , 1, 0) YAAAAAYYYYY!!!! Thanks, solved AND I learned something new. Thanks, alll, and have a GREAT weekend! Jen
Just for the "macro-take-away": this is the reason we don't really like console printout instead of dput() to show a problem: if you dput the original not-yet-ifelse-d numbers, you'll see that they really aren't 1's, but that they are truncated upon regular printing. Cheers and don't forget the old adage: 0.1*10 is hardly ever 1, Michael
On Fri, Aug 24, 2012 at 6:27 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
I see that you got other responses while I was composing an answer.
Your 'example.csv' did come through for me, but I still can't
replicate your PM.DIST_flag variable. Specifically, observations
30, 33, 36 and 40 are wrong.
I agree with Rui, that there's something else going on. The data
you've sent can't be the data that yielded the 'flag' variable
or you didn't use the ifelse() function in the way that you've
shown.
I would start with a clean R session and I would use the 'convert
logical to numeric' idea (or keep a logical rather than numeric
flag):
vn <- transform(vn,
my_flag = ( (PM.EXP > 0) & (PM.DIST.TOT != 1) ) * 1 )
It looks as though your PM.DIST.TOT variable is meant to be
integer. If so, you might want to ensure that it is that type.
Otherwise, you might want to use Michael's suggestion of using
abs(... - 1) < 1e-05.
Peter Ehlers
On 2012-08-24 14:56, Jennifer Sabatier wrote:
Hi Michael, Thanks for letting me know how to post data. I will try to upload it that way in a second. I can usually use code to make a reproducible dataset but this time with the ifelse behaving strangely (perhaps, it's probably me) I didn't think I could do it easily so I figured I would just put my data up. I will check out the R FAQ you mentioned. Thanks, again, Jen On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away. For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.