Hi, I just observed a strange behavior in R. The rnorm function does not give me the numbers with a given length. I think it is somehow related to the internal representation of double-type numbers but I am not sure if this is supposed to happen. Below is a reproducible example ``` ## Create a list, we will only take the forth value, which is 0.6 nList <- seq(0,1,0.2) n <- nList[4] n # [1] 0.6 length(rnorm(1000*n)) # [1] 600 length(rnorm(1000-1000*n)) # [1] 399 <--- What happened here? length(rnorm(1000-1000*0.6)) # [1] 400 1000-1000*n # [1] 400 <- this looks good to me... 1000-1000*0.6 # [1] 400 identical(n, 0.6) # [1] FALSE .Internal(inspect(n)) # @0x00000217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6 .Internal(inspect(0.6)) # @0x00000217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6 ``` As you can see, length(rnorm(1000-1000*n)) does not really give me the result I want. This is somewhat surprising because it is hard to imagine that a manually-typed 0.6 can behave differently than 0.6 from a sequence. Furthermore, 0.6 is the only problematic number from `nList`. The rest numbers work fine. I can guess it is due to the rounding mechanism, but I think this should be treated as a bug: if the print function can show the result of 1000-1000*n correctly, it will be strange that rnorm behaves differently. Below is my session info R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 time zone: America/Chicago tzcode source: internal
Strange Behavior in RNG
6 messages · Jiefei Wang, John Fox, Rui Barradas +2 more
Dear Jiefei Wang, This is really a more appropriate question for the r-help list than for the r-devel list. Neverthless, see item 7.31 in the R FAQ <https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f>, about floating-point arithmetic. I hope this helps, John
John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://www.john-fox.ca/ -- On 2024-08-16 8:45 p.m., Jiefei Wang wrote: > Caution: External email. > > > Hi, > > I just observed a strange behavior in R. The rnorm function does not > give me the numbers with a given length. I think it is somehow related > to the internal representation of double-type numbers but I am not > sure if this is supposed to happen. Below is a reproducible example > > ``` > ## Create a list, we will only take the forth value, which is 0.6 > nList <- seq(0,1,0.2) > n <- nList[4] > n > # [1] 0.6 > length(rnorm(1000*n)) > # [1] 600 > length(rnorm(1000-1000*n)) > # [1] 399 <--- What happened here? > length(rnorm(1000-1000*0.6)) > # [1] 400 > 1000-1000*n > # [1] 400 <- this looks good to me... > 1000-1000*0.6 > # [1] 400 > identical(n, 0.6) > # [1] FALSE > .Internal(inspect(n)) > # @0x00000217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6 > .Internal(inspect(0.6)) > # @0x00000217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6 > ``` > > As you can see, length(rnorm(1000-1000*n)) does not really give me the > result I want. This is somewhat surprising because it is hard to > imagine that a manually-typed 0.6 can behave differently than 0.6 from > a sequence. Furthermore, 0.6 is the only problematic number from > `nList`. The rest numbers work fine. I can guess it is due to the > rounding mechanism, but I think this should be treated as a bug: if > the print function can show the result of 1000-1000*n correctly, it > will be strange that rnorm behaves differently. Below is my session > info > > R version 4.3.0 (2023-04-21 ucrt) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19045) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.utf8 > [2] LC_CTYPE=English_United States.utf8 > [3] LC_MONETARY=English_United States.utf8 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.utf8 > > time zone: America/Chicago > tzcode source: internal > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
?s 01:45 de 17/08/2024, Jiefei Wang escreveu:
Hi, I just observed a strange behavior in R. The rnorm function does not give me the numbers with a given length. I think it is somehow related to the internal representation of double-type numbers but I am not sure if this is supposed to happen. Below is a reproducible example ``` ## Create a list, we will only take the forth value, which is 0.6 nList <- seq(0,1,0.2) n <- nList[4] n # [1] 0.6 length(rnorm(1000*n)) # [1] 600 length(rnorm(1000-1000*n)) # [1] 399 <--- What happened here? length(rnorm(1000-1000*0.6)) # [1] 400 1000-1000*n # [1] 400 <- this looks good to me... 1000-1000*0.6 # [1] 400 identical(n, 0.6) # [1] FALSE .Internal(inspect(n)) # @0x00000217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6 .Internal(inspect(0.6)) # @0x00000217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6 ``` As you can see, length(rnorm(1000-1000*n)) does not really give me the result I want. This is somewhat surprising because it is hard to imagine that a manually-typed 0.6 can behave differently than 0.6 from a sequence. Furthermore, 0.6 is the only problematic number from `nList`. The rest numbers work fine. I can guess it is due to the rounding mechanism, but I think this should be treated as a bug: if the print function can show the result of 1000-1000*n correctly, it will be strange that rnorm behaves differently. Below is my session info R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 time zone: America/Chicago tzcode source: internal
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hello, This is R FAQ 7.31. In fact, the sequences seq(0, 1, 0.1) seq(0, 1, 0.2) should probably be a FAQ 7.31 example. If you print the numbers with more decimals you will see why the error. # generate the list nList <- seq(0,1,0.2) # compare the list with manually typed numbers nList != c(0, 0.2, 0.4, 0.6, 0.8, 1) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE # note the value of 0.6 print(nList, digits = 16L) #> [1] 0.0000000000000000 0.2000000000000000 0.4000000000000000 0.6000000000000001 #> [5] 0.8000000000000000 1.0000000000000000 Hope this helps, Rui Barradas
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com
Hi Rui and John, Thanks for your reply. I'm not sure if this is a question for R-help as I think the behavior of RNG is weird, but I will happy to move this discussion if the admin think this is not their topic. I was a C/C++ developer so I understand the double-type numbers sometimes can generate surprising results, but what unexpected here is that even the number is super close to 400 'rnorm' still rounds it down to 399. Shouldn't it be round up in this case? Probably the underlying code just convert the number into an int type, but I was expecting that the function can tolerate a certain degree of errors. Maybe I have too much expectations for it... Best, Jiefei
On Fri, Aug 16, 2024, 22:19 Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 01:45 de 17/08/2024, Jiefei Wang escreveu:
Hi, I just observed a strange behavior in R. The rnorm function does not give me the numbers with a given length. I think it is somehow related to the internal representation of double-type numbers but I am not sure if this is supposed to happen. Below is a reproducible example ``` ## Create a list, we will only take the forth value, which is 0.6 nList <- seq(0,1,0.2) n <- nList[4] n # [1] 0.6 length(rnorm(1000*n)) # [1] 600 length(rnorm(1000-1000*n)) # [1] 399 <--- What happened here? length(rnorm(1000-1000*0.6)) # [1] 400 1000-1000*n # [1] 400 <- this looks good to me... 1000-1000*0.6 # [1] 400 identical(n, 0.6) # [1] FALSE .Internal(inspect(n)) # @0x00000217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6 .Internal(inspect(0.6)) # @0x00000217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6 ``` As you can see, length(rnorm(1000-1000*n)) does not really give me the result I want. This is somewhat surprising because it is hard to imagine that a manually-typed 0.6 can behave differently than 0.6 from a sequence. Furthermore, 0.6 is the only problematic number from `nList`. The rest numbers work fine. I can guess it is due to the rounding mechanism, but I think this should be treated as a bug: if the print function can show the result of 1000-1000*n correctly, it will be strange that rnorm behaves differently. Below is my session info R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 time zone: America/Chicago tzcode source: internal
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hello, This is R FAQ 7.31. In fact, the sequences seq(0, 1, 0.1) seq(0, 1, 0.2) should probably be a FAQ 7.31 example. If you print the numbers with more decimals you will see why the error. # generate the list nList <- seq(0,1,0.2) # compare the list with manually typed numbers nList != c(0, 0.2, 0.4, 0.6, 0.8, 1) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE # note the value of 0.6 print(nList, digits = 16L) #> [1] 0.0000000000000000 0.2000000000000000 0.4000000000000000 0.6000000000000001 #> [5] 0.8000000000000000 1.0000000000000000 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com
Hi Jiefei, I don't believe there's an issue with the RNG (Random Number Generator). The unexpected result you're seeing is due to providing a float number to the rnorm() function, which expects an integer. When you input a float, R coerces it to an integer using as.integer(). This function truncates the decimal part rather than rounding. Here's an example to illustrate: nList <- seq(0, 1, 0.2) n <- nList[4] n [1] 0.6 1000 - 1000 * n [1] 400 as.integer(1000 - 1000 * n) [1] 399 As you can see, it is truncated to 399. Regards, Xianying Tan (shrektan) ---- Replied Message ---- | From | Jiefei Wang<szwjf08 at gmail.com> | | Date | 08/17/2024 12:11 | | To | Rui Barradas<ruipbarradas at sapo.pt> | | Cc | r-devel<R-devel at r-project.org> | | Subject | Re: [Rd] Strange Behavior in RNG | Hi Rui and John, Thanks for your reply. I'm not sure if this is a question for R-help as I think the behavior of RNG is weird, but I will happy to move this discussion if the admin think this is not their topic. I was a C/C++ developer so I understand the double-type numbers sometimes can generate surprising results, but what unexpected here is that even the number is super close to 400 'rnorm' still rounds it down to 399. Shouldn't it be round up in this case? Probably the underlying code just convert the number into an int type, but I was expecting that the function can tolerate a certain degree of errors. Maybe I have too much expectations for it... Best, Jiefei
On Fri, Aug 16, 2024, 22:19 Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 01:45 de 17/08/2024, Jiefei Wang escreveu: Hi, I just observed a strange behavior in R. The rnorm function does not give me the numbers with a given length. I think it is somehow related to the internal representation of double-type numbers but I am not sure if this is supposed to happen. Below is a reproducible example ``` ## Create a list, we will only take the forth value, which is 0.6 nList <- seq(0,1,0.2) n <- nList[4] n # [1] 0.6 length(rnorm(1000*n)) # [1] 600 length(rnorm(1000-1000*n)) # [1] 399 <--- What happened here? length(rnorm(1000-1000*0.6)) # [1] 400 1000-1000*n # [1] 400 <- this looks good to me... 1000-1000*0.6 # [1] 400 identical(n, 0.6) # [1] FALSE .Internal(inspect(n)) # @0x00000217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6 .Internal(inspect(0.6)) # @0x00000217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6 ``` As you can see, length(rnorm(1000-1000*n)) does not really give me the result I want. This is somewhat surprising because it is hard to imagine that a manually-typed 0.6 can behave differently than 0.6 from a sequence. Furthermore, 0.6 is the only problematic number from `nList`. The rest numbers work fine. I can guess it is due to the rounding mechanism, but I think this should be treated as a bug: if the print function can show the result of 1000-1000*n correctly, it will be strange that rnorm behaves differently. Below is my session info R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 time zone: America/Chicago tzcode source: internal ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Hello, This is R FAQ 7.31. In fact, the sequences seq(0, 1, 0.1) seq(0, 1, 0.2) should probably be a FAQ 7.31 example. If you print the numbers with more decimals you will see why the error. # generate the list nList <- seq(0,1,0.2) # compare the list with manually typed numbers nList != c(0, 0.2, 0.4, 0.6, 0.8, 1) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE # note the value of 0.6 print(nList, digits = 16L) #> [1] 0.0000000000000000 0.2000000000000000 0.4000000000000000 0.6000000000000001 #> [5] 0.8000000000000000 1.0000000000000000 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
You could argue that the 'n' argument should be rounded rather than truncated, but this form of coercion from float to integer is common/standard (in C, for example). In any case, it's a long standing part of R and is very unlikely to be changed ...
On Sat, Aug 17, 2024, 12:11 AM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hi Rui and John, Thanks for your reply. I'm not sure if this is a question for R-help as I think the behavior of RNG is weird, but I will happy to move this discussion if the admin think this is not their topic. I was a C/C++ developer so I understand the double-type numbers sometimes can generate surprising results, but what unexpected here is that even the number is super close to 400 'rnorm' still rounds it down to 399. Shouldn't it be round up in this case? Probably the underlying code just convert the number into an int type, but I was expecting that the function can tolerate a certain degree of errors. Maybe I have too much expectations for it... Best, Jiefei On Fri, Aug 16, 2024, 22:19 Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 01:45 de 17/08/2024, Jiefei Wang escreveu:
Hi, I just observed a strange behavior in R. The rnorm function does not give me the numbers with a given length. I think it is somehow related to the internal representation of double-type numbers but I am not sure if this is supposed to happen. Below is a reproducible example ``` ## Create a list, we will only take the forth value, which is 0.6 nList <- seq(0,1,0.2) n <- nList[4] n # [1] 0.6 length(rnorm(1000*n)) # [1] 600 length(rnorm(1000-1000*n)) # [1] 399 <--- What happened here? length(rnorm(1000-1000*0.6)) # [1] 400 1000-1000*n # [1] 400 <- this looks good to me... 1000-1000*0.6 # [1] 400 identical(n, 0.6) # [1] FALSE .Internal(inspect(n)) # @0x00000217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6 .Internal(inspect(0.6)) # @0x00000217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6 ``` As you can see, length(rnorm(1000-1000*n)) does not really give me the result I want. This is somewhat surprising because it is hard to imagine that a manually-typed 0.6 can behave differently than 0.6 from a sequence. Furthermore, 0.6 is the only problematic number from `nList`. The rest numbers work fine. I can guess it is due to the rounding mechanism, but I think this should be treated as a bug: if the print function can show the result of 1000-1000*n correctly, it will be strange that rnorm behaves differently. Below is my session info R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 time zone: America/Chicago tzcode source: internal
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hello, This is R FAQ 7.31. In fact, the sequences seq(0, 1, 0.1) seq(0, 1, 0.2) should probably be a FAQ 7.31 example. If you print the numbers with more decimals you will see why the error. # generate the list nList <- seq(0,1,0.2) # compare the list with manually typed numbers nList != c(0, 0.2, 0.4, 0.6, 0.8, 1) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE # note the value of 0.6 print(nList, digits = 16L) #> [1] 0.0000000000000000 0.2000000000000000 0.4000000000000000 0.6000000000000001 #> [5] 0.8000000000000000 1.0000000000000000 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel