An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120824/dd195d96/attachment.pl>
ifelse problem - bug or operator error
21 messages · Bert Gunter, Rui Barradas, R. Michael Weylandt +2 more
On Fri, Aug 24, 2012 at 3:22 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command.
You probably should have: dput() makes it super easy as well.
Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1
Indeed it makes no sense to me either because you sent HTML email which got mangled by the server.
As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data. WWHHHYYYYYYYY???? Why why why why why why why? Why? (Sorry, I've been trying to figure this out for hours and I've devolved to mumbling in corners and banging my head against the table) What in the world am I doing wrong? Or is ifelse not the right function?
First guess.... standard problems with equality of floating point numbers. (See R FAQ 7.31 for the details) You probably want to change x == 1 to abs(x - 1) < 1e-05 or something similar. Cheers, Michael
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Oops, sorry, I thought I was in plain text. I can't tell the difference because I use so little formatting in my emails. Try this (a truncated version since I have to hand space everything): PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 6417 1 1 23 1 0 97691 2 1 0 0 0 33993 2 1 On Fri, Aug 24, 2012 at 4:36 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 3:22 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command.
You probably should have: dput() makes it super easy as well.
Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1
Indeed it makes no sense to me either because you sent HTML email which got mangled by the server.
As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data. WWHHHYYYYYYYY???? Why why why why why why why? Why? (Sorry, I've been trying to figure this out for hours and I've devolved to mumbling in corners and banging my head against the table) What in the world am I doing wrong? Or is ifelse not the right function?
First guess.... standard problems with equality of floating point numbers. (See R FAQ 7.31 for the details) You probably want to change x == 1 to abs(x - 1) < 1e-05 or something similar. Cheers, Michael
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
... and if Michael is correct, there is a lesson here: Think of how much time and aggravation you would have saved yourself if you had FIRST made an effort to read the docs. The FAQ's are there for a reason. As is An Introduction to R, which also should be read before posting on this list. If Michael is wrong, then the above homily should be amended to: If you have not already done so, read the FAQ's and An Introduction to R. You will save yourself -- and us -- much time and effort by doing so. Thus endeth the lesson. -- Bert On Fri, Aug 24, 2012 at 1:36 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 3:22 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command.
You probably should have: dput() makes it super easy as well.
Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1
Indeed it makes no sense to me either because you sent HTML email which got mangled by the server.
As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data. WWHHHYYYYYYYY???? Why why why why why why why? Why? (Sorry, I've been trying to figure this out for hours and I've devolved to mumbling in corners and banging my head against the table) What in the world am I doing wrong? Or is ifelse not the right function?
First guess.... standard problems with equality of floating point numbers. (See R FAQ 7.31 for the details) You probably want to change x == 1 to abs(x - 1) < 1e-05 or something similar. Cheers, Michael
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
On 2012-08-24 13:22, Jennifer Sabatier wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command. Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1 As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data.
_Many_ instances?? I see only 4 such cases. Still not good, though.
Here's what you should do:
1. Don't send html mail.
2. use simple variable names (x,y,z would do fine here).
3. either provide the data or at least a part of it with dput() or
at least provide str(vn).
4. when you're trying to decide between operator error and bug, go
with the operator error theory. You'll be correct at least 99% of
the time.
I ran your command on the above (suitably deciphered) data and had no
problem getting what I think you expect (i.e. the four suspect cases
came out just as they should). But what my mailer provides as your
data may not be what you really have.
Oh, and get a bandage for that head bruise.
Peter Ehlers
WWHHHYYYYYYYY???? Why why why why why why why? Why? (Sorry, I've been trying to figure this out for hours and I've devolved to mumbling in corners and banging my head against the table) What in the world am I doing wrong? Or is ifelse not the right function? Best, Jen [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert, I will thank you not to condescend to me, as I am too damn old (40) to be treated that way. You didn't even offer a solution to my problem. You only came to chastise me with regards to your assumptions about me, which is very annoying. While I am at the beginner level of R, I am not an idiot. Or an infant. I've been a statistician for more than 12 years, mostly programming in SAS. I've been moving into R for the past 3 years. I am EXTENSIVELY familiar with the documentation. The documentation is how I've learned to code in R. I NEVER post to R-help until I've exhausted my own ability to solve the problem, INCLUDING combing through the documentation. And looking at other requests for help to see if the problem has arisen for someone else. One of the reason I wait to the last minute to post to R-help is because it is a very snobby and often unfriendly list, as your comment illustrates. You just assumed I hadn't already looked though the docs. Nice. The reality of it is, what Michael pointed out to me wasn't something I'd discovered in my own search of the docs. And that, Bert, someone like me comes to R-help. Please, Bert, don't we want the whole world to use R? They won't if the community is so unwelcoming. Now, if you have a solution, please post it. If not, leave off while I explore Michael's suggestion. Best, Jen
On Fri, Aug 24, 2012 at 4:49 PM, Bert Gunter <gunter.berton at gene.com> wrote:
... and if Michael is correct, there is a lesson here: Think of how much time and aggravation you would have saved yourself if you had FIRST made an effort to read the docs. The FAQ's are there for a reason. As is An Introduction to R, which also should be read before posting on this list. If Michael is wrong, then the above homily should be amended to: If you have not already done so, read the FAQ's and An Introduction to R. You will save yourself -- and us -- much time and effort by doing so. Thus endeth the lesson. -- Bert On Fri, Aug 24, 2012 at 1:36 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 3:22 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command.
You probably should have: dput() makes it super easy as well.
Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1
Indeed it makes no sense to me either because you sent HTML email which got mangled by the server.
As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data. WWHHHYYYYYYYY???? Why why why why why why why? Why? (Sorry, I've been trying to figure this out for hours and I've devolved to mumbling in corners and banging my head against the table) What in the world am I doing wrong? Or is ifelse not the right function?
First guess.... standard problems with equality of floating point numbers. (See R FAQ 7.31 for the details) You probably want to change x == 1 to abs(x - 1) < 1e-05 or something similar. Cheers, Michael
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hi Peter, I'm really sorry, I thought I was in plain text. I don't use any formatting in my emails and in Gmail the HTML looks the same as plain text. Anyway, I've attached the data (I didn't think we could do that but I am frequently wrong). I say many cases because this is just a subset of >300 observations. The error seems to happen without a pattern I can discern. I am assuming I am doing something wrong. Thanks, Jen
On Fri, Aug 24, 2012 at 5:17 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
On 2012-08-24 13:22, Jennifer Sabatier wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command. Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1 As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data.
_Many_ instances?? I see only 4 such cases. Still not good, though. Here's what you should do: 1. Don't send html mail. 2. use simple variable names (x,y,z would do fine here). 3. either provide the data or at least a part of it with dput() or at least provide str(vn). 4. when you're trying to decide between operator error and bug, go with the operator error theory. You'll be correct at least 99% of the time. I ran your command on the above (suitably deciphered) data and had no problem getting what I think you expect (i.e. the four suspect cases came out just as they should). But what my mailer provides as your data may not be what you really have. Oh, and get a bandage for that head bruise. Peter Ehlers
WWHHHYYYYYYYY???? Why why why why why why why? Why?
(Sorry, I've been trying to figure this out for hours and I've devolved to
mumbling in corners and banging my head against the table)
What in the world am I doing wrong? Or is ifelse not the right function?
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, Michael's standard guess, FAQ 7.31, was also mine, but is wrong. The error is in Jennifer's flag column, not the result of her ifelse. (!) x <- scan(what="character", text=" PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1 ") vn <- matrix(as.numeric(x[-(1:3)]), ncol=3, byrow=TRUE) vn <- data.frame(vn) names(vn) <- x[1:3] str(vn) vn$flag2 <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) str(vn) identical(vn$PM.DIST_flag, vn$flag2) # FALSE inx <- vn$PM.DIST_flag != vn$flag2 vn[inx, ] As you can see, there's nothing wrong with ifelse. The second standard guess is that Jennifer had something, a variable, in her session messing up with the variables envolved in the ifelse. Also, when the result of a if/else or ifelse is, by this order, 1 or 0, the following can be used, with performance benefits. vn$flag3 <- 1 * (vn$PM.EXP > 0.0 & vn$PM.DIST.TOT != 1.0) # make T/F numeric 1/0 Hope this helps, Rui Barradas Em 24-08-2012 21:43, Jennifer Sabatier escreveu:
Oops, sorry, I thought I was in plain text. I can't tell the difference because I use so little formatting in my emails. Try this (a truncated version since I have to hand space everything): PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 6417 1 1 23 1 0 97691 2 1 0 0 0 33993 2 1 On Fri, Aug 24, 2012 at 4:36 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 3:22 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi R-Helpers, I don't think I need to post a dataset for this question but if I do, I can. Anyway, I am having a lot of trouble with the ifelse command.
You probably should have: dput() makes it super easy as well.
Here is my code: vn$PM.DIST_flag <- ifelse( (vn$PM.EXP > 0.0) & (vn$PM.DIST.TOT != 1.0), 1, 0 ) And here is my output that doesn't make ANY sense: PM.EXP PM.DIST.TOT PM.DIST_flag 0 0 0 0 0 0 0 0 0 177502 1 0 31403 1 0 0 0 0 1100549 1 0 38762 1 0 0 0 0 20025 1 0 0 0 0 13742 1 0 0 0 0 83078 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23 1 0 3402 1 0 8504 1 1 8552 1 0 9723 1 0 37273 1 1 396 1 0 1478 1 0 2074 1 0 12220 1 1 97691 2 1 0 0 0 33993 2 1
Indeed it makes no sense to me either because you sent HTML email which got mangled by the server.
As you can see, there are many instances where PM.EXP > 0 and PM.DIST.TOT = 1 yet PM.DIST_flag = 1 and it should be 0. It should only flag in cases such as the last line of data. WWHHHYYYYYYYY???? Why why why why why why why? Why? (Sorry, I've been trying to figure this out for hours and I've devolved to mumbling in corners and banging my head against the table) What in the world am I doing wrong? Or is ifelse not the right function?
First guess.... standard problems with equality of floating point numbers. (See R FAQ 7.31 for the details) You probably want to change x == 1 to abs(x - 1) < 1e-05 or something similar. Cheers, Michael
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
BTW - no one else who has replied to this topic was snobby or unfriendly and I thank you very much for trying to help me. It's just Bert is not the first to respond to my request for help as such. As someone looking forward to becoming an advanced R programmer in my statistical work it is discouraging to be castigated FOR ASKING FOR HELP. Thank you, all. Best, Jen
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen
On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now. Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
No data arrived to me. Rui Barradas Em 24-08-2012 22:46, Jennifer Sabatier escreveu:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now. Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away. For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
On Fri, Aug 24, 2012 at 4:50 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away.
Note that the dput() output is to be put into the body of the email, not in an attachment or we're back where we started ;-) M
For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
Oh, sorry, I first though you couldn't post data to the list, but then I thought I remembered other people doing so, so I tried to post it. Here is a copy. Thanks, Jen
On Fri, Aug 24, 2012 at 5:49 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
No data arrived to me. Rui Barradas Em 24-08-2012 22:46, Jennifer Sabatier escreveu:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now. Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
Hi Michael, Thanks for letting me know how to post data. I will try to upload it that way in a second. I can usually use code to make a reproducible dataset but this time with the ifelse behaving strangely (perhaps, it's probably me) I didn't think I could do it easily so I figured I would just put my data up. I will check out the R FAQ you mentioned. Thanks, again, Jen On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away. For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
I see that you got other responses while I was composing an answer.
Your 'example.csv' did come through for me, but I still can't
replicate your PM.DIST_flag variable. Specifically, observations
30, 33, 36 and 40 are wrong.
I agree with Rui, that there's something else going on. The data
you've sent can't be the data that yielded the 'flag' variable
or you didn't use the ifelse() function in the way that you've
shown.
I would start with a clean R session and I would use the 'convert
logical to numeric' idea (or keep a logical rather than numeric
flag):
vn <- transform(vn,
my_flag = ( (PM.EXP > 0) & (PM.DIST.TOT != 1) ) * 1 )
It looks as though your PM.DIST.TOT variable is meant to be
integer. If so, you might want to ensure that it is that type.
Otherwise, you might want to use Michael's suggestion of using
abs(... - 1) < 1e-05.
Peter Ehlers
On 2012-08-24 14:56, Jennifer Sabatier wrote:
Hi Michael, Thanks for letting me know how to post data. I will try to upload it that way in a second. I can usually use code to make a reproducible dataset but this time with the ifelse behaving strangely (perhaps, it's probably me) I didn't think I could do it easily so I figured I would just put my data up. I will check out the R FAQ you mentioned. Thanks, again, Jen On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away. For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
AAAAAHHHHHHH I GOT IT!!!!!!!!!! And I *think* I understand about floating point arithmetic.. In this case vn$PM.DIST.TOT is the sum of proportions. So, it should be anywhere 0 and 1. In our case, if it's anything other than 1 when vn$PM.EXP is greater than 0 then it means something is wrong with one of the variables used to sum vn$PM.DIST.TOT. I was worried making it an integer will cause cases of 0.4 to be 0 and look legal, when it's not (though it doesn't actually seem to be a problem). So, I just did what Michael and Peter suggested, after reading up on floating points. fpf <- 1e-05 # fpf = floating point fuzz vn$PM.DIST_flag<-ifelse(vn$PM.EXP > 0 & abs(vn$PM.DIST.TOT - 1) > fpf , 1, 0) YAAAAAYYYYY!!!! Thanks, solved AND I learned something new. Thanks, alll, and have a GREAT weekend! Jen
On Fri, Aug 24, 2012 at 6:27 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
I see that you got other responses while I was composing an answer.
Your 'example.csv' did come through for me, but I still can't
replicate your PM.DIST_flag variable. Specifically, observations
30, 33, 36 and 40 are wrong.
I agree with Rui, that there's something else going on. The data
you've sent can't be the data that yielded the 'flag' variable
or you didn't use the ifelse() function in the way that you've
shown.
I would start with a clean R session and I would use the 'convert
logical to numeric' idea (or keep a logical rather than numeric
flag):
vn <- transform(vn,
my_flag = ( (PM.EXP > 0) & (PM.DIST.TOT != 1) ) * 1 )
It looks as though your PM.DIST.TOT variable is meant to be
integer. If so, you might want to ensure that it is that type.
Otherwise, you might want to use Michael's suggestion of using
abs(... - 1) < 1e-05.
Peter Ehlers
On 2012-08-24 14:56, Jennifer Sabatier wrote:
Hi Michael, Thanks for letting me know how to post data. I will try to upload it that way in a second. I can usually use code to make a reproducible dataset but this time with the ifelse behaving strangely (perhaps, it's probably me) I didn't think I could do it easily so I figured I would just put my data up. I will check out the R FAQ you mentioned. Thanks, again, Jen On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away. For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Fri, Aug 24, 2012 at 7:29 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
AAAAAHHHHHHH I GOT IT!!!!!!!!!! And I *think* I understand about floating point arithmetic..
Well then you're doing much better than the rest of us: it's quite a difficult subject and only gets trickier as you think about it more. (Numerical analysis generally, not the definition of an IEEE754 / ISO 6059 double) You even get such fun as -1 * 0 != 1 * 0. under some interpretations.
In this case vn$PM.DIST.TOT is the sum of proportions. So, it should be anywhere 0 and 1. In our case, if it's anything other than 1 when vn$PM.EXP is greater than 0 then it means something is wrong with one of the variables used to sum vn$PM.DIST.TOT. I was worried making it an integer will cause cases of 0.4 to be 0 and look legal, when it's not (though it doesn't actually seem to be a problem). So, I just did what Michael and Peter suggested, after reading up on floating points. fpf <- 1e-05 # fpf = floating point fuzz
Though I sugested 1e-05 here, usually one uses slightly more stringent testing: a general rule of thumb is the square root of machine precision. In R terms, sqrt(.Machine$double.eps)
vn$PM.DIST_flag<-ifelse(vn$PM.EXP > 0 & abs(vn$PM.DIST.TOT - 1) > fpf , 1, 0) YAAAAAYYYYY!!!! Thanks, solved AND I learned something new. Thanks, alll, and have a GREAT weekend! Jen
Just for the "macro-take-away": this is the reason we don't really like console printout instead of dput() to show a problem: if you dput the original not-yet-ifelse-d numbers, you'll see that they really aren't 1's, but that they are truncated upon regular printing. Cheers and don't forget the old adage: 0.1*10 is hardly ever 1, Michael
On Fri, Aug 24, 2012 at 6:27 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
I see that you got other responses while I was composing an answer.
Your 'example.csv' did come through for me, but I still can't
replicate your PM.DIST_flag variable. Specifically, observations
30, 33, 36 and 40 are wrong.
I agree with Rui, that there's something else going on. The data
you've sent can't be the data that yielded the 'flag' variable
or you didn't use the ifelse() function in the way that you've
shown.
I would start with a clean R session and I would use the 'convert
logical to numeric' idea (or keep a logical rather than numeric
flag):
vn <- transform(vn,
my_flag = ( (PM.EXP > 0) & (PM.DIST.TOT != 1) ) * 1 )
It looks as though your PM.DIST.TOT variable is meant to be
integer. If so, you might want to ensure that it is that type.
Otherwise, you might want to use Michael's suggestion of using
abs(... - 1) < 1e-05.
Peter Ehlers
On 2012-08-24 14:56, Jennifer Sabatier wrote:
Hi Michael, Thanks for letting me know how to post data. I will try to upload it that way in a second. I can usually use code to make a reproducible dataset but this time with the ifelse behaving strangely (perhaps, it's probably me) I didn't think I could do it easily so I figured I would just put my data up. I will check out the R FAQ you mentioned. Thanks, again, Jen On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Michael, No, I never use attach(), exactly for the reasons you state. To do due diligence I did a search of code for the function and it didn't come up (I would have been shocked because I never us it!). Now that real data is up, does your suggestion still apply? I am reading it now.
If you mean the data you sent to Peter, it got scrubbed by the list servers as well (they are somewhat draconian, but appropriately so in the long run). The absolute best way to send R data via email (esp on this list) is to use the dput() function which will create a plain text representation of your data _exactly_ as R sees it. It's a little hard for the untrained eye to parse (I can usually get about 90% of what it all means but there's some stuff with rownames = NA I've never looked into) but it's perfectly reproducible to a different R session. Then us having the same data is a simple copy+paste away. For more on dput() and reproducibility generally, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It could be the floating point thing (it's hard to say without knowing how your data was calculated), but Rui seems to think not. M
Thanks, Jen On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Off the wall / wild guess, do you use attach() frequently? Not entirely sure how it would come up, but it tends to make weird errors like this occur. M On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier <plessthanpointohfive at gmail.com> wrote:
Hi Rui, Thanks so much for responding but I think with my HTML problem the vn data you made must not be the same. I tried running your code on the data (I uploaded a copy) and I got the same thing I had before. Jen On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1 1 23
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hah - I guess I didn't mean I understood it in full as I expect I will run into it again without anticipating it. But, now that I know the "old adage" I will look there first when I run into a problem. Also, I used the square root of machine precision instead - thanks for that, too. Thank you, thank you! I really appreciate all the help. Best, Jen On Fri, Aug 24, 2012 at 10:31 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
sqrt(.Machine$double.eps)