Why does R think these numbers ***are*** equal?
In a somewhat bizarre set of circumstances I calculated
x0 <- 0.03580067
x1 <- 0.03474075
y0 <- 0.4918823
y1 <- 0.4474461
dx <- x1 - x0
dy <- y1 - y0
xx <- (x0 + x1)/2
yy <- (y0 + y1)/2
chk <- yy*dx - xx*dy + x0*dy - y0*dx
If you think about it ***very*** carefully ( :-) ) you'll see that
``chk'' ought to be zero.
Blow me down, R gets 0. Exactly. To as many significant digits/decimal
places
as I can get it to print out.
But .... I wrote a wee function in C to do the *same* calculation and
dyn.load()-ed
it and called it with .C(). And I got -1.248844e-19.
This is of course zero, to all floating point arithmetic intents and
purposes. But if
I name the result returned by my call to .C() ``xxx'' and ask
xxx >= 0
I get FALSE whereas ``chk >= 0'' returns TRUE (as does ``chk <= 0'', of
course).
(And inside my C function, the comparison ``xxx >= 0'' yields ``false''
as well.)
I was vaguely thinking that raw R arithmetic would be equivalent to C
arithmetic.
(Isn't R written in C?)
Can someone explain to me how it is that R (magically) gets it exactly
right, whereas
a call to .C() gives the sort of ``approximately right'' answer that one
might usually
expect? I know that R Core is ***good*** but even they can't make C do
infinite
precision arithmetic. :-)
This is really just idle curiosity --- I realize that this phenomenon is
one that I'll simply have
to live with. But if I can get some deeper insight as to why it occurs,
well, that would
be nice.
cheers,
Rolf Turner
Inverse of FAQ 7.31.
7 messages · Rolf Turner, Chandra Salgado Kent, ONKELINX, Thierry +3 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110802/f8339364/attachment.pl>
Dear Chandra, You're on the wrong track. You don't need for loops as you can do this vectorised. as.numeric(interaction(data$Groups, data$Dates, drop = TRUE)) Best regards, Thierry
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Chandra Salgado Kent
Verzonden: dinsdag 2 augustus 2011 9:12
Aan: r-help at r-project.org
Onderwerp: [R] Loops to assign a unique ID to a column
Dear R help,
I am fairly new in data management and programming in R, and am trying to
write what is probably a simple loop, but am not having any luck. I have a
dataframe with something like the following (but much bigger):
Dates<-c("12/10/2010","12/10/2010","12/10/2010","13/10/2010",
"13/10/2010", "13/10/2010")
Groups<-c("A","B","B","A","B","C")
data<-data.frame(Dates, Groups)
I would like to create a new column in the dataframe, and give each distinct
date by group a unique identifying number starting with 1, so that the resulting
column would look something like:
ID<-c(1,2,2,3,4,5)
The loop that I have started to write is something like this (but doesn't work!):
data$ID<-as.number(c())
for(i in unique(data$Dates)){
for(j in unique(data$Groups)){ data$ID[i,j]<-i
i<-i+1
}
}
Am I on the right track?
Any help on this is much appreciated!
Chandra
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Aug 2, 2011, at 08:02 , Rolf Turner wrote:
Why does R think these numbers ***are*** equal? In a somewhat bizarre set of circumstances I calculated x0 <- 0.03580067 x1 <- 0.03474075 y0 <- 0.4918823 y1 <- 0.4474461 dx <- x1 - x0 dy <- y1 - y0 xx <- (x0 + x1)/2 yy <- (y0 + y1)/2 chk <- yy*dx - xx*dy + x0*dy - y0*dx If you think about it ***very*** carefully ( :-) ) you'll see that ``chk'' ought to be zero. Blow me down, R gets 0. Exactly. To as many significant digits/decimal places as I can get it to print out. But .... I wrote a wee function in C to do the *same* calculation and dyn.load()-ed it and called it with .C(). And I got -1.248844e-19. This is of course zero, to all floating point arithmetic intents and purposes. But if I name the result returned by my call to .C() ``xxx'' and ask xxx >= 0 I get FALSE whereas ``chk >= 0'' returns TRUE (as does ``chk <= 0'', of course). (And inside my C function, the comparison ``xxx >= 0'' yields ``false'' as well.) I was vaguely thinking that raw R arithmetic would be equivalent to C arithmetic. (Isn't R written in C?) Can someone explain to me how it is that R (magically) gets it exactly right, whereas a call to .C() gives the sort of ``approximately right'' answer that one might usually expect? I know that R Core is ***good*** but even they can't make C do infinite precision arithmetic. :-) This is really just idle curiosity --- I realize that this phenomenon is one that I'll simply have to live with. But if I can get some deeper insight as to why it occurs, well, that would be nice.
I think the long and the short of it is that R lost a couple of bits of precision that C retained. This sort of thing happens if R stores things into 64 bit floating point objects while C keeps them in 80 bit CPU registers. In general, floating point calculations do not obey the laws of math, for example the associative law (i.e., (a+b)-c ?= a+(b-c), especially if b and c are large and nearly equal), so any reordering of expressions by the compiler may give a slightly different result.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg
How about this?
indx <- unique(cbind(Dates, Groups)) indx
Dates Groups [1,] "12/10/2010" "A" [2,] "12/10/2010" "B" [3,] "13/10/2010" "A" [4,] "13/10/2010" "B" [5,] "13/10/2010" "C"
indx <- data.frame(indx, id=1:nrow(indx)) indx
Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 13/10/2010 A 3 4 13/10/2010 B 4 5 13/10/2010 C 5
newdata <- merge(data, indx) newdata
Dates Groups id
1 12/10/2010 A 1
2 12/10/2010 B 2
3 12/10/2010 B 2
4 13/10/2010 A 3
5 13/10/2010 B 4
6 13/10/2010 C 5
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Chandra Salgado Kent
Sent: Tuesday, August 02, 2011 2:12 AM
To: r-help at r-project.org
Subject: [R] Loops to assign a unique ID to a column
Dear R help,
I am fairly new in data management and programming in R, and am trying to
write what is probably a simple loop, but am not having any luck. I have a
dataframe with something like the following (but much bigger):
Dates<-c("12/10/2010","12/10/2010","12/10/2010","13/10/2010", "13/10/2010",
"13/10/2010")
Groups<-c("A","B","B","A","B","C")
data<-data.frame(Dates, Groups)
I would like to create a new column in the dataframe, and give each distinct
date by group a unique identifying number starting with 1, so that the
resulting column would look something like:
ID<-c(1,2,2,3,4,5)
The loop that I have started to write is something like this (but doesn't
work!):
data$ID<-as.number(c())
for(i in unique(data$Dates)){
for(j in unique(data$Groups)){ data$ID[i,j]<-i
i<-i+1
}
}
Am I on the right track?
Any help on this is much appreciated!
Chandra
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Whoa!
1. First and most important, there is very likely no reason you need
to do this. R can handle multiple groupings automatically in fitting
and plotting without creating artificial labels of the sort you appear
to want to create. Please read an "Intro to R" and/or get help to see
how.
2. The "solution" offered below is unnecessarily convoluted. Here is a
simpler and faster one:
z <- within(z, indx <- as.numeric(interaction(Dates,Groups,
drop=TRUE, lex.order=TRUE)))
Explanation:
interaction() produces all possible combinations the individual
groupings; drop=FALSE throws away any unused combinations,
lex.order-TRUE lexicographically orders the levels as you indicated.
?interaction for details.
By default, the result of the above is a factor, which as.numeric()
converts to the numeric codes used in factor representations. ?factor
.
Finally, within() interprets and makes changes within z. The changed
result is then assigned back to z so that it is not lost. ?within
Cheers,
Bert
On Tue, Aug 2, 2011 at 8:36 AM, David L Carlson <dcarlson at tamu.edu> wrote:
How about this?
indx <- unique(cbind(Dates, Groups)) indx
? ? Dates ? ? ? ?Groups [1,] "12/10/2010" "A" [2,] "12/10/2010" "B" [3,] "13/10/2010" "A" [4,] "13/10/2010" "B" [5,] "13/10/2010" "C"
indx <- data.frame(indx, id=1:nrow(indx)) indx
? ? ? Dates Groups id 1 12/10/2010 ? ? ?A ?1 2 12/10/2010 ? ? ?B ?2 3 13/10/2010 ? ? ?A ?3 4 13/10/2010 ? ? ?B ?4 5 13/10/2010 ? ? ?C ?5
newdata <- merge(data, indx) newdata
? ? ? Dates Groups id
1 12/10/2010 ? ? ?A ?1
2 12/10/2010 ? ? ?B ?2
3 12/10/2010 ? ? ?B ?2
4 13/10/2010 ? ? ?A ?3
5 13/10/2010 ? ? ?B ?4
6 13/10/2010 ? ? ?C ?5
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Chandra Salgado Kent
Sent: Tuesday, August 02, 2011 2:12 AM
To: r-help at r-project.org
Subject: [R] Loops to assign a unique ID to a column
Dear R help,
I am fairly new in data management and programming in R, and am trying to
write what is probably a simple loop, but am not having any luck. I have a
dataframe with something like the following (but much bigger):
Dates<-c("12/10/2010","12/10/2010","12/10/2010","13/10/2010", "13/10/2010",
"13/10/2010")
Groups<-c("A","B","B","A","B","C")
data<-data.frame(Dates, Groups)
I would like to create a new column in the dataframe, and give each distinct
date by group a unique identifying number starting with 1, so that the
resulting column would look something like:
ID<-c(1,2,2,3,4,5)
The loop that I have started to write is something like this (but doesn't
work!):
data$ID<-as.number(c())
for(i in unique(data$Dates)){
?for(j in unique(data$Groups)){ data$ID[i,j]<-i
?i<-i+1
?}
}
Am I on the right track?
Any help on this is much appreciated!
Chandra
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
"Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics
Thanks to Peter Dalgaard and to Baptiste Auguie (off-list) for the
insights they provided.
cheers,
Rolf turner