Please consider the following code: set.seed(1) train.index = sample(181,150) head(train.index) # [1] 49 67 103 162 36 159 Result from my ASUS computer # # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer In both cases, version 3.6.3 of R are being used. In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions of the data are being used by the student. Thank you for considering my question. Sincerely, Tom Fomby Professor of Economics SMU Dallas, TX 75275 tfomby at smu.edu
Question about "sample" function and inconsistent results I am getting across machines.
8 messages · Jeff Newmiller, Fomby, Tom, Duncan Murdoch
On 03/05/2020 1:39 a.m., Fomby, Tom wrote:
Please consider the following code: set.seed(1) train.index = sample(181,150) head(train.index) # [1] 49 67 103 162 36 159 Result from my ASUS computer # # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer In both cases, version 3.6.3 of R are being used. In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions of the data are being used by the student. Thank you for considering my question.
Likely some of you are storing and restoring workspaces, and have been doing so for a long time. If you type RNGkind() what you should see is [1] "Mersenne-Twister" "Inversion" "Rejection" but if the .Random.seed is restored from an old session, you might see [1] "Mersenne-Twister" "Inversion" "Rounding" The latter uses the buggy version of sample(). Those users should run RNGkind(sample.kind = "Rejection") to start using the corrected sampling algorithm. (The default was changed in R 3.6.0, but if you saved your seed from a previous version, you'd get the old sampler). They should also stop reloading old workspaces, but that's another discussion. Duncan Murdoch
It is a lot easier from this side of the conversation to view skeptically the claim that all of these installations of R are using the same version than that the software seed has started behaving randomly within the same version of R.
On May 2, 2020 10:39:58 PM PDT, "Fomby, Tom" <tfomby at mail.smu.edu> wrote:
Please consider the following code: set.seed(1) train.index = sample(181,150) head(train.index) # [1] 49 67 103 162 36 159 Result from my ASUS computer # # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer In both cases, version 3.6.3 of R are being used. In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions of the data are being used by the student. Thank you for considering my question. Sincerely, Tom Fomby Professor of Economics SMU Dallas, TX 75275 tfomby at smu.edu [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
I just tried both versions, and it's the ASUS that's using the buggy old algorithm. Duncan Murdoch
On 03/05/2020 3:32 p.m., Duncan Murdoch wrote:
On 03/05/2020 1:39 a.m., Fomby, Tom wrote:
Please consider the following code: set.seed(1) train.index = sample(181,150) head(train.index) # [1] 49 67 103 162 36 159 Result from my ASUS computer # # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer In both cases, version 3.6.3 of R are being used. In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions of the data are being used by the student. Thank you for considering my question.
Likely some of you are storing and restoring workspaces, and have been doing so for a long time. If you type RNGkind() what you should see is [1] "Mersenne-Twister" "Inversion" "Rejection" but if the .Random.seed is restored from an old session, you might see [1] "Mersenne-Twister" "Inversion" "Rounding" The latter uses the buggy version of sample(). Those users should run RNGkind(sample.kind = "Rejection") to start using the corrected sampling algorithm. (The default was changed in R 3.6.0, but if you saved your seed from a previous version, you'd get the old sampler). They should also stop reloading old workspaces, but that's another discussion. Duncan Murdoch
Dear Duncan, OK, I will certainly ask my students to download the most recent version of Basic R at the first of each semester and, just to be safe, include the RNGkind(sample.kind="Rejection") command before the students get started on the data partitioning part of their exercise using the sample function. By the way, how is it that one can take a membership in the R community so as to provide support for volunteers like yourself. Thank you, Tom Fomby Department of Economics SMU Dallas, TX 75275
From: Duncan Murdoch <murdoch.duncan at gmail.com>
Sent: Sunday, May 3, 2020 2:32 PM
To: Fomby, Tom; r-help at R-project.org
Subject: Re: [R] Question about "sample" function and inconsistent results I am getting across machines.
Sent: Sunday, May 3, 2020 2:32 PM
To: Fomby, Tom; r-help at R-project.org
Subject: Re: [R] Question about "sample" function and inconsistent results I am getting across machines.
On 03/05/2020 1:39 a.m., Fomby, Tom wrote: > Please consider the following code: > > set.seed(1) > > train.index = sample(181,150) > head(train.index) > # [1] 49 67 103 162 36 159 Result from my ASUS computer > # > # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer > > In both cases, version 3.6.3 of R are being used. > > In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions of the data are being used by the student. > > Thank you for considering my question. Likely some of you are storing and restoring workspaces, and have been doing so for a long time. If you type RNGkind() what you should see is [1] "Mersenne-Twister" "Inversion" "Rejection" but if the .Random.seed is restored from an old session, you might see [1] "Mersenne-Twister" "Inversion" "Rounding" The latter uses the buggy version of sample(). Those users should run RNGkind(sample.kind = "Rejection") to start using the corrected sampling algorithm. (The default was changed in R 3.6.0, but if you saved your seed from a previous version, you'd get the old sampler). They should also stop reloading old workspaces, but that's another discussion. Duncan Murdoch
Thank you so much, Duncan. Out of the 20 students in my class, evidently 6 out of the 20 have been using the buggy version of sample(). I am so appreciative that you have helped me get a grip on things. I was tired of having two keys to my homework exercises. Amazing that your were able to trace the version of sample() in my ASUS computer. Me running on 3.6.3 did not fix things because of its determined adherence to the buggy version. Much appreciation, Tom Fomby Department of Economics SMU
From: Duncan Murdoch <murdoch.duncan at gmail.com>
Sent: Sunday, May 3, 2020 2:36:50 PM
To: Fomby, Tom; r-help at R-project.org
Subject: Re: [R] Question about "sample" function and inconsistent results I am getting across machines.
Sent: Sunday, May 3, 2020 2:36:50 PM
To: Fomby, Tom; r-help at R-project.org
Subject: Re: [R] Question about "sample" function and inconsistent results I am getting across machines.
I just tried both versions, and it's the ASUS that's using the buggy old algorithm. Duncan Murdoch On 03/05/2020 3:32 p.m., Duncan Murdoch wrote: > On 03/05/2020 1:39 a.m., Fomby, Tom wrote: >> Please consider the following code: >> >> set.seed(1) >> >> train.index = sample(181,150) >> head(train.index) >> # [1] 49 67 103 162 36 159 Result from my ASUS computer >> # >> # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer >> >> In both cases, version 3.6.3 of R are being used. >> >> In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions of the data are being used by the student. >> >> Thank you for considering my question. > > Likely some of you are storing and restoring workspaces, and have been > doing so for a long time. If you type > > RNGkind() > > what you should see is > > [1] "Mersenne-Twister" "Inversion" "Rejection" > > but if the .Random.seed is restored from an old session, you might see > > [1] "Mersenne-Twister" "Inversion" "Rounding" > > The latter uses the buggy version of sample(). Those users should run > > RNGkind(sample.kind = "Rejection") > > to start using the corrected sampling algorithm. (The default was > changed in R 3.6.0, but if you saved your seed from a previous version, > you'd get the old sampler). > > They should also stop reloading old workspaces, but that's another > discussion. > > Duncan Murdoch >
On 03/05/2020 3:43 p.m., Fomby, Tom wrote:
Dear Duncan, OK, I will certainly ask my students to download the most recent version of Basic R at the first of each semester?and, just to be safe, include the RNGkind(sample.kind="Rejection") command before the students get started on the data partitioning part of their exercise using the sample function.
Actually, it would probably be a better idea to say RNGkind(kind = "default", normal.kind = "default", sample.kind = "default") in case bugs are found in any of the current algorithms and they change again.
By the way, how is it that one can take a membership in the R community so as to provide support for volunteers like yourself.
The R Foundation accepts donations to become a "Supporting Member"; see here: https://www.r-project.org/foundation/donors.html. They sponsor various events, so that is one way. There is probably also a local user group somewhere near you that would appreciate contributions of some sort. There's a list of those here: https://blog.revolutionanalytics.com/local-r-groups.html, and another one here: https://www.meetup.com/pro/r-user-groups/. (I haven't checked how similar those two lists are.) Duncan Murdoch
Thank you, Tom Fomby Department of Economics SMU Dallas, TX 75275 ------------------------------------------------------------------------ *From:* Duncan Murdoch <murdoch.duncan at gmail.com> *Sent:* Sunday, May 3, 2020 2:32 PM *To:* Fomby, Tom; r-help at R-project.org *Subject:* Re: [R] Question about "sample" function and inconsistent results I am getting across machines. On 03/05/2020 1:39 a.m., Fomby, Tom wrote:
Please consider the following code: set.seed(1) train.index = sample(181,150) head(train.index) # [1]? 49? 67 103 162? 36 159? Result from my ASUS computer # # [1]? 68 167 129 162 43 14? Result from my wife's HP Pavilion computer In both cases, version 3.6.3 of R are being used. In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result.? These results do not seem to be specific to MAC (OS) versus PC (Windows).? In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions
of the data are being used by the student.
Thank you for considering my question.
Likely some of you are storing and restoring workspaces, and have been doing so for a long time.? If you type RNGkind() what you should see is [1] "Mersenne-Twister" "Inversion"??????? "Rejection" but if the .Random.seed is restored from an old session, you might see [1] "Mersenne-Twister" "Inversion"??????? "Rounding" The latter uses the buggy version of sample().? Those users should run RNGkind(sample.kind = "Rejection") to start using the corrected sampling algorithm.? (The default was changed in R 3.6.0, but if you saved your seed from a previous version, you'd get the old sampler). They should also stop reloading old workspaces, but that's another discussion. Duncan Murdoch
Thank you so much Duncan. I will pitch in. Tom
From: Duncan Murdoch <murdoch.duncan at gmail.com>
Sent: Sunday, May 3, 2020 2:56 PM
To: Fomby, Tom; r-help at R-project.org
Subject: Re: [R] Question about "sample" function and inconsistent results I am getting across machines.
Sent: Sunday, May 3, 2020 2:56 PM
To: Fomby, Tom; r-help at R-project.org
Subject: Re: [R] Question about "sample" function and inconsistent results I am getting across machines.
On 03/05/2020 3:43 p.m., Fomby, Tom wrote: > > Dear Duncan, > > OK, I will certainly ask my students to download the most recent version > of Basic R at the first of each semester and, just to be safe, include > the RNGkind(sample.kind="Rejection") command before the students get > started on the data partitioning part of their exercise using the sample > function. Actually, it would probably be a better idea to say RNGkind(kind = "default", normal.kind = "default", sample.kind = "default") in case bugs are found in any of the current algorithms and they change again. > > By the way, how is it that one can take a membership in the R community > so as to provide support for volunteers like yourself. The R Foundation accepts donations to become a "Supporting Member"; see here: https://www.r-project.org/foundation/donors.html. They sponsor various events, so that is one way. There is probably also a local user group somewhere near you that would appreciate contributions of some sort. There's a list of those here: https://blog.revolutionanalytics.com/local-r-groups.html, and another one here: https://www.meetup.com/pro/r-user-groups/. (I haven't checked how similar those two lists are.) Duncan Murdoch > > Thank you, > > Tom Fomby > > Department of Economics > > SMU > > Dallas, TX 75275 > > > > ------------------------------------------------------------------------ > *From:* Duncan Murdoch <murdoch.duncan at gmail.com> > *Sent:* Sunday, May 3, 2020 2:32 PM > *To:* Fomby, Tom; r-help at R-project.org > *Subject:* Re: [R] Question about "sample" function and inconsistent > results I am getting across machines. > On 03/05/2020 1:39 a.m., Fomby, Tom wrote: >> Please consider the following code: >> >> set.seed(1) >> >> train.index = sample(181,150) >> head(train.index) >> # [1] 49 67 103 162 36 159 Result from my ASUS computer >> # >> # [1] 68 167 129 162 43 14 Result from my wife's HP Pavilion computer >> >> In both cases, version 3.6.3 of R are being used. >> >> In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result. These results do not seem to be specific to MAC (OS) versus PC (Windows). In several cases, students using 3.6.3 got differing results. This makes grading of homework challenging not knowing which partitions > of the data are being used by the student. >> >> Thank you for considering my question. > > Likely some of you are storing and restoring workspaces, and have been > doing so for a long time. If you type > > RNGkind() > > what you should see is > > [1] "Mersenne-Twister" "Inversion" "Rejection" > > but if the .Random.seed is restored from an old session, you might see > > [1] "Mersenne-Twister" "Inversion" "Rounding" > > The latter uses the buggy version of sample(). Those users should run > > RNGkind(sample.kind = "Rejection") > > to start using the corrected sampling algorithm. (The default was > changed in R 3.6.0, but if you saved your seed from a previous version, > you'd get the old sampler). > > They should also stop reloading old workspaces, but that's another > discussion. > > Duncan Murdoch