Skip to content

ifelse problem - bug or operator error

21 messages · Bert Gunter, Rui Barradas, R. Michael Weylandt +2 more

#
On Fri, Aug 24, 2012 at 3:22 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
You probably should have: dput() makes it super easy as well.
Indeed it makes no sense to me either because you sent HTML email
which got mangled by the server.
First guess.... standard problems with equality of floating point
numbers. (See R FAQ 7.31 for the details)

You probably want to change

x == 1

to

abs(x - 1) < 1e-05

or something similar.

Cheers,
Michael
#
Oops, sorry, I thought I was in plain text.  I can't tell the
difference because I use so little formatting in my emails.

Try this (a truncated version since I have to hand space everything):

PM.EXP	PM.DIST.TOT	PM.DIST_flag
0	       0	                        0
6417       1	                        1
23	       1	                        0
97691	2	                        1
0	       0	                        0
33993	2	                        1


On Fri, Aug 24, 2012 at 4:36 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
#
... and if Michael is correct, there is a lesson here: Think of how
much time and aggravation you would have saved yourself if you had
FIRST made an effort to read the docs. The FAQ's are there for a
reason. As is An Introduction to R, which also should be read before
posting on this list.

If Michael is wrong, then the above homily should be amended to:
If you have not already done so, read the FAQ's and An Introduction to
R. You will save yourself -- and us -- much time and effort by doing
so.

Thus endeth the lesson.

-- Bert

On Fri, Aug 24, 2012 at 1:36 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:

  
    
#
On 2012-08-24 13:22, Jennifer Sabatier wrote:
_Many_ instances?? I see only 4 such cases. Still not good, though.
Here's what you should do:

1. Don't send html mail.
2. use simple variable names (x,y,z would do fine here).
3. either provide the data or at least a part of it with dput() or
    at least provide str(vn).
4. when you're trying to decide between operator error and bug, go
    with the operator error theory. You'll be correct at least 99% of
    the time.

I ran your command on the above (suitably deciphered) data and had no
problem getting what I think you expect (i.e. the four suspect cases
came out just as they should). But what my mailer provides as your
data may not be what you really have.

Oh, and get a bandage for that head bruise.

Peter Ehlers
#
Bert,

I will thank you not to condescend to me, as I am too damn old (40) to
be treated that way.  You didn't even offer a solution to my problem.
You only came to chastise me with regards to your assumptions about
me, which is very annoying.

While I am at the beginner level of R, I am not an idiot.  Or an
infant.  I've been a statistician for more than 12 years, mostly
programming in SAS.  I've been moving into R for the past 3 years.

I am EXTENSIVELY familiar with the documentation.  The documentation
is how I've learned to code in R.

I NEVER post to R-help until I've exhausted my own ability to solve
the problem, INCLUDING combing through the documentation.  And looking
at other requests for help to see if the problem has arisen for
someone else.

One of the reason I wait to the last minute to post to R-help is
because it is a very snobby and often unfriendly list, as your comment
illustrates.  You just assumed I hadn't already looked though the
docs.  Nice.

The reality of it is, what Michael pointed out to me wasn't something
I'd discovered in my own search of the docs.  And that, Bert, someone
like me comes to R-help.

Please, Bert, don't we want the whole world to use R?  They won't if
the community is so unwelcoming.

Now, if you have a solution, please post it.  If not, leave off while
I explore Michael's suggestion.

Best,

Jen
On Fri, Aug 24, 2012 at 4:49 PM, Bert Gunter <gunter.berton at gene.com> wrote:
#
Hi Peter,

I'm really sorry, I thought I was in plain text.  I don't use any
formatting in my emails and in Gmail the HTML looks the same as plain
text.

Anyway, I've attached the data (I didn't think we could do that but I
am frequently wrong).

I say many cases because this is just a subset of >300 observations.

The error seems to happen without a pattern I can discern.  I am
assuming I am doing something wrong.

Thanks,

Jen
On Fri, Aug 24, 2012 at 5:17 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
#
Hello,

Michael's standard guess, FAQ 7.31, was also mine, but is wrong. The 
error is in Jennifer's flag column,  not the result of her ifelse. (!)


x <- scan(what="character", text="
  PM.EXP PM.DIST.TOT PM.DIST_flag  0 0 0  0 0 0  0 0 0  177502 1 0 31403 1
0  0 0 0  1100549 1 0  38762 1 0  0 0 0  20025 1 0  0 0 0  13742 1 0  0 0 0
83078 1 0  0 0 0  0 0 0  0 0 0  0 0 0  0 0 0  0 0 0  0 0 0  0 0 0  0 0 0
165114 1 0  0 0 0  417313 1 0  3546 1 0  4613 1 0  225460 1 0  6417 1 1  23
1 0  3402 1 0  8504 1 1  8552 1 0  9723 1 0  37273 1 1  396 1 0 1478 1 0
2074 1 0  12220 1 1  97691 2 1  0 0 0  33993 2 1
")

vn <- matrix(as.numeric(x[-(1:3)]), ncol=3, byrow=TRUE)
vn <- data.frame(vn)
names(vn) <- x[1:3]
str(vn)
vn$flag2 <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 )

str(vn)
identical(vn$PM.DIST_flag, vn$flag2)  # FALSE
inx <- vn$PM.DIST_flag != vn$flag2
vn[inx, ]

As you can see, there's nothing wrong with ifelse. The second standard 
guess is that Jennifer had something, a variable, in her session messing 
up with the variables envolved in the ifelse.

Also, when the result of a if/else or ifelse is, by this order, 1 or 0, 
the following can be used, with performance benefits.

vn$flag3 <- 1 * (vn$PM.EXP > 0.0 & vn$PM.DIST.TOT != 1.0) # make T/F 
numeric 1/0

Hope this helps,

Rui Barradas
Em 24-08-2012 21:43, Jennifer Sabatier escreveu:
#
BTW - no one else who has replied to this topic was snobby or
unfriendly and I thank you very much for trying to help me.

It's just Bert is not the first to respond to my request for help as
such.  As someone looking forward to becoming an advanced R programmer
in my statistical work it is discouraging to be castigated FOR ASKING
FOR HELP.

Thank you, all.

Best,

Jen
#
Hi Rui,

Thanks so much for responding but I think with my HTML problem the vn
data you made must not be the same.  I tried running your code on the
data (I uploaded a copy) and I got the same thing I had before.

Jen
On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
#
Off the wall / wild guess, do you use attach() frequently? Not
entirely sure how it would come up, but it tends to make weird errors
like this occur.

M

On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
#
Hi Michael,

No, I never use attach(), exactly for the reasons you state.  To do
due diligence I did a search of code for the function and it didn't
come up (I would have been shocked because I never us it!).

Now that real data is up, does your suggestion still apply?  I am
reading it now.

Thanks,

Jen

On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
#
No data arrived to me.

Rui Barradas
Em 24-08-2012 22:46, Jennifer Sabatier escreveu:
#
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
If you mean the data you sent to Peter, it got scrubbed by the list
servers as well (they are somewhat draconian, but appropriately so in
the long run). The absolute best way to send R data via email (esp on
this list) is to use the dput() function which will create a plain
text representation of your data _exactly_ as R sees it. It's a little
hard for the untrained eye to parse (I can usually get about 90% of
what it all means but there's some stuff with rownames = NA I've never
looked into) but it's perfectly reproducible to a different R session.
Then us having the same data is a simple copy+paste away.

For more on dput() and reproducibility generally, see
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

It could be the floating point thing (it's hard to say without knowing
how your data was calculated), but Rui seems to think not.

M
#
On Fri, Aug 24, 2012 at 4:50 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Note that the dput() output is to be put into the body of the email,
not in an attachment or we're back where we started ;-)

M
#
Oh, sorry, I first though you couldn't post data to the list, but then
I thought I remembered other people doing so, so I tried to post it.

Here is a copy.

Thanks,

Jen
On Fri, Aug 24, 2012 at 5:49 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
#
Hi Michael,

Thanks for letting me know how to post data.  I will try to upload it
that way in a second.

I can usually use code to make a reproducible dataset but this time
with the ifelse behaving strangely (perhaps, it's probably me) I
didn't think I could do it easily so I figured I would just put my
data up.

I will check out the R FAQ you mentioned.

Thanks, again,

Jen



On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
#
I see that you got other responses while I was composing an answer.
Your 'example.csv' did come through for me, but I still can't
replicate your PM.DIST_flag variable. Specifically, observations
30, 33, 36 and 40 are wrong.

I agree with Rui, that there's something else going on. The data
you've sent can't be the data that yielded the 'flag' variable
or you didn't use the ifelse() function in the way that you've
shown.

I would start with a clean R session and I would use the 'convert
logical to numeric' idea (or keep a logical rather than numeric
flag):

   vn <- transform(vn,
           my_flag = ( (PM.EXP > 0) & (PM.DIST.TOT != 1) ) * 1 )

It looks as though your PM.DIST.TOT variable is meant to be
integer. If so, you might want to ensure that it is that type.
Otherwise, you might want to use Michael's suggestion of using
abs(... - 1) < 1e-05.

Peter Ehlers
On 2012-08-24 14:56, Jennifer Sabatier wrote:
#
AAAAAHHHHHHH I GOT IT!!!!!!!!!!

And I *think* I understand about floating point arithmetic..

In this case vn$PM.DIST.TOT is the sum of proportions.  So, it should
be anywhere 0 and 1.

In our case, if it's anything other than 1 when vn$PM.EXP is greater
than 0  then it means something is wrong with one of the variables
used to sum vn$PM.DIST.TOT.

I was worried making it an integer will cause cases of 0.4 to be 0 and
look legal, when it's not (though it doesn't actually seem to be a
problem).

So, I just did what Michael and Peter suggested, after reading up on
floating points.

fpf <- 1e-05   # fpf = floating point fuzz

vn$PM.DIST_flag<-ifelse(vn$PM.EXP > 0 & abs(vn$PM.DIST.TOT - 1) > fpf , 1, 0)

YAAAAAYYYYY!!!!

Thanks, solved AND I learned something new.

Thanks, alll, and have a GREAT weekend!

Jen
On Fri, Aug 24, 2012 at 6:27 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
#
On Fri, Aug 24, 2012 at 7:29 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
Well then you're doing much better than the rest of us: it's quite a
difficult subject and only gets trickier as you think about it more.
(Numerical analysis generally, not the definition of an IEEE754 / ISO
6059 double) You even get such fun as

-1 * 0 != 1 * 0.

under some interpretations.
Though I sugested 1e-05 here, usually one uses slightly more stringent
testing: a general rule of thumb is the square root of machine
precision. In R terms,

sqrt(.Machine$double.eps)
Just for the "macro-take-away": this is the reason we don't really
like console printout instead of dput() to show a problem: if you dput
the original not-yet-ifelse-d numbers, you'll see that they really
aren't 1's, but that they are truncated upon regular printing.

Cheers and don't forget the old adage: 0.1*10 is hardly ever 1,
Michael
#
Hah - I guess I didn't mean I understood it in full as I expect I will
run into it again without anticipating it.

But, now that I know the "old adage" I will look there first when I
run into a problem.

Also, I used the square root of machine precision instead - thanks for
that, too.

Thank you, thank you!  I really appreciate all the help.

Best,

Jen

On Fri, Aug 24, 2012 at 10:31 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote: