Skip to content
Prev 391096 / 398506 Next

How important is set.seed

Not wrong, just mostly different words.
1) I think of reproducible code as something for teaching or sharing. It can be useful in debugging if I want help (one reason for sharing). In solo debugging my code, I have not used set.seed() -- at least not yet. However, my programs are all small, mostly less than 100 lines of code.
2) Agreed. 
3) Agreed -- one needs to be very clear on why one is using set seed(). In many situations it is undoing the purpose of using a random number generator. 
4) Agreed -- this is why it is so important to publish the version of R and the package used when presenting results. A great deal of effort has gone into building and selecting a good RNG. Depending on how the RNG is used, a basic understanding of what defines "good" is valuable. If there are huge numbers of calls to the RNG then periodicity in the RNG may start making a difference. Random.org might be another place for the OP to explore.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com> 
Sent: Tuesday, March 22, 2022 12:12 PM
To: Neha gupta <neha.bologna90 at gmail.com>
Cc: Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org
Subject: Re: [R] How important is set.seed

[External Email]

OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But...

1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible".
That is, if the procedure is repeated *exactly,* by invoking
set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim).

2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop.

3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed()

4. The "randomness" of any sequence of results from any particular
set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file.

*** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.***

Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
Message-ID: <BN6PR2201MB15530A8E8C30AE44F105F458CF179@BN6PR2201MB1553.namprd22.prod.outlook.com>
In-Reply-To: <CAGxFJbQ7_PVwxOr3eVPf6R4rfedFEyo=PJjOZF65kBnvE17ZAw@mail.gmail.com>