Hello everyone I want to know (1) In which cases, we need to use set.seed while building ML models? (2) Which is the exact location we need to put the set.seed function i.e. when we split data into train/test sets, or just before we train a model? Thank you
How important is set.seed
19 messages · Bijesh Mishra, Jin Li, Neha gupta +3 more
First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed.
On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone I want to know (1) In which cases, we need to use set.seed while building ML models? (2) Which is the exact location we need to put the set.seed function i.e. when we split data into train/test sets, or just before we train a model? Thank you [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed.
On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
if you think you are drawing a sample and you don't want to change your sampling and your sample every time you draw a sample, you better use seed. If you are unsure whether to use seed or not and if you think using seed helps, it's better to use seed (this is a very general suggestion, and not always applicable/useful.). Put seed before you draw a sample.
On Mon, Mar 21, 2022 at 7:04 PM Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The answer may depend on the model type you are going to develop. For predictive models, yes you do need it. The dependence of predictive accuracy measures on random seeds and dependence of stabilized predictive accuracy measures on random seeds have been demonstrated and discussed in Spatial Predictive Modeling with R (doi:10.1201/9781003091776), where many reproducible examples are provided for various predictive methods including RF, GBM and SVM. Hope this helps. Jin
On Tue, Mar 22, 2022 at 11:51 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jin ------------------------------------------ Jin Li, PhD Founder, Data2action, Australia https://www.researchgate.net/profile/Jin_Li32 https://scholar.google.com/citations?user=Jeot53EAAAAJ&hl=en [[alternative HTML version deleted]]
Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim From: Neha gupta <neha.bologna90 at gmail.com> Sent: Tuesday, March 22, 2022 6:33 AM To: Ebert,Timothy Aaron <tebert at ufl.edu> Cc: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org Subject: Re: How important is set.seed [External Email] Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org<mailto:r-help at r-project.org>; Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>>; r-help mailing list <r-help at r-project.org<mailto:r-help at r-project.org>> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed.
On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
My inclination is to follow Jeff?s advice and put it at the beginning of the program. You can always experiment: set.seed(42) rnorm(5,5,5) rnorm(5,5,5) runif(5,0,3) As long as the commands are executed in the order they are written, then the outcome is the same every time. Set seed is giving you reproducible outcomes. However, the second rnorm() does not give you the same outcome as the first. So set seed starts at the same point but if you want the first and second rnorm() call to give the same results you will need another set.seed(42). Note also, that it does not matter if you pause: run the above code as a chunk, or run each command individually you get the same result (as long as you do it in the sequence written). So, if you set seed, run some code, take a break, come back write some more code you might get in trouble because R is still using the original set.seed() command. To solve this issue use set.seed(Sys.time()) Or set.seed(NULL) Some of this is just good programming style workflow: Import data Declare variables and constants (set.seed() typically goes here) Define functions Body of code Generate output Clean up ( set.seed(NULL) would go here, along with removing unused variables and such) Regards, Tim From: Neha gupta <neha.bologna90 at gmail.com> Sent: Tuesday, March 22, 2022 10:48 AM To: Ebert,Timothy Aaron <tebert at ufl.edu> Cc: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org Subject: Re: How important is set.seed [External Email] Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim From: Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> Sent: Tuesday, March 22, 2022 6:33 AM To: Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> Cc: Jeff Newmiller <jdnewmil at dcn.davis.ca.us<mailto:jdnewmil at dcn.davis.ca.us>>; r-help at r-project.org<mailto:r-help at r-project.org> Subject: Re: How important is set.seed [External Email] Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org<mailto:r-help at r-project.org>; Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>>; r-help mailing list <r-help at r-project.org<mailto:r-help at r-project.org>> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed.
On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
Thank you again Tim
d=readARFF("my data")
set.seed(123)
tr <- d[index, ]
ts <- d[-index, ]
ctrl <- trainControl(method = "repeatedcv",number=10)
set.seed(123)
ran_search <- train(lneff ~ ., data = tr,
method = "mlp",
tuneLength = 30,
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = ctrl)
getTrainPerf(ran_search)
Would it be good?
On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
My inclination is to follow Jeff?s advice and put it at the beginning of the program. You can always experiment: set.seed(42) rnorm(5,5,5) rnorm(5,5,5) runif(5,0,3) As long as the commands are executed in the order they are written, then the outcome is the same every time. Set seed is giving you reproducible outcomes. However, the second rnorm() does not give you the same outcome as the first. So set seed starts at the same point but if you want the first and second rnorm() call to give the same results you will need another set.seed(42). Note also, that it does not matter if you pause: run the above code as a chunk, or run each command individually you get the same result (as long as you do it in the sequence written). So, if you set seed, run some code, take a break, come back write some more code you might get in trouble because R is still using the original set.seed() command. To solve this issue use set.seed(Sys.time()) Or set.seed(NULL) Some of this is just good programming style workflow: Import data Declare variables and constants (set.seed() typically goes here) Define functions Body of code Generate output Clean up ( set.seed(NULL) would go here, along with removing unused variables and such) Regards, Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 10:48 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I read a paper two days ago (and that's why I then posted here about set.seed) which used interpretable machine learning. According to the authors, different explanations (of the black-box models) will be produced by the ML models if different seeds are used or never used.
On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret
library.
Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment
in
an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am
not
quite sure how many seeds would constitute a good sample. For me that
would
depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this,
it's
recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu>
wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a
standard
analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you
selected
to get a particular result. However, in cases where you need a
reproducible
example, debugging a program, or specific other cases where you might
need
the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff
Newmiller
Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>;
r-help
mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use
set.seed.
However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing
code,
not results. It is your responsibility to not let this become a
crutch... a
randomized simulation that is actually sensitive to the seed is
unlikely to
offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of
your
parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com
wrote:
Hello everyone I want to know (1) In which cases, we need to use set.seed while building ML models? (2) Which is the exact location we need to put the set.seed function
i.e.
when we split data into train/test sets, or just before we train a
model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
PLEASE do read the posting guide
_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_
AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy
RxpXsq4Y3TRMU&e=
PLEASE do read the posting guide https://urldefense.proofpoint. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Well, course! -- any procedure that incorporates "randomness" will produce *different* random results from different random choices. set.seed() assures you get the same random choices and hence the same random results.
set.seed(567) sample(1:5,1)
[1] 5
sample(1:5,1)
[1] 4
sample(1:5,1)
[1] 5
sample(1:5,1)
[1] 5
sample(1:5,1)
[1] 2 ## change seed
set.seed(123) sample(1:5,1)
[1] 3
sample(1:5,1)
[1] 3
sample(1:5,1)
[1] 2
sample(1:5,1)
[1] 2
sample(1:5,1)
[1] 3
sample(1:5,1)
[1] 5 ## back to original. All subsequent random values as previously
set.seed(567) sample(1:5,1)
[1] 5
sample(1:5,1)
[1] 4
sample(1:5,1)
[1] 5
sample(1:5,1)
[1] 5
sample(1:5,1)
[1] 2
sample(1:5,1)
[1] 5 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 9:19 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
I read a paper two days ago (and that's why I then posted here about set.seed) which used interpretable machine learning. According to the authors, different explanations (of the black-box models) will be produced by the ML models if different seeds are used or never used. On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Assuming train methods MLP and repatedcv both draw from the R random number generator, they do different things with those numbers. This question is like asking whether player 1 will consistently win if you play war and gin rummy with identically-shuffled decks of cards... all you can tell is that repeating war games will turn out the same. There is no intrinsic value to setting the seed in this case. If you want to compare results then you need to run each training session with enough variation in seeds that the resulting averaged uncertainty in results is consistent, and then consider comparing between methods. Simply repeating the train calls without using set.seed at all will accomplish this. But this departs from discussion of the R language into a discussion of how caret::train works (not on-topic here)... which I don't know anything about, but you clearly need to understand better.
On March 22, 2022 9:03:21 AM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Thank you again Tim
d=readARFF("my data")
set.seed(123)
tr <- d[index, ]
ts <- d[-index, ]
ctrl <- trainControl(method = "repeatedcv",number=10)
set.seed(123)
ran_search <- train(lneff ~ ., data = tr,
method = "mlp",
tuneLength = 30,
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = ctrl)
getTrainPerf(ran_search)
Would it be good?
On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
My inclination is to follow Jeff?s advice and put it at the beginning of the program. You can always experiment: set.seed(42) rnorm(5,5,5) rnorm(5,5,5) runif(5,0,3) As long as the commands are executed in the order they are written, then the outcome is the same every time. Set seed is giving you reproducible outcomes. However, the second rnorm() does not give you the same outcome as the first. So set seed starts at the same point but if you want the first and second rnorm() call to give the same results you will need another set.seed(42). Note also, that it does not matter if you pause: run the above code as a chunk, or run each command individually you get the same result (as long as you do it in the sequence written). So, if you set seed, run some code, take a break, come back write some more code you might get in trouble because R is still using the original set.seed() command. To solve this issue use set.seed(Sys.time()) Or set.seed(NULL) Some of this is just good programming style workflow: Import data Declare variables and constants (set.seed() typically goes here) Define functions Body of code Generate output Clean up ( set.seed(NULL) would go here, along with removing unused variables and such) Regards, Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 10:48 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
That approach would start the trainControl method at set.seed(123) and it would start ran_search at set.seed(123).
I am not sure it would be good or not ? especially in this context. I am not clear on how the results are being compared, but I could get some differences if one method had a few extra calls to an RNG (random number generator).
I would think it makes more sense to ask how approach 1 differs from approach 2 over a wide range of seeds. You are not testing the RNG, and I am not sure using the same seed for each model makes a difference unless the analysis is a paired samples approach. Might it be more effective to remove the initial set.seed() and then replace the second set.seed with set.seed(NULL) ?
Otherwise wrap this into a loop
N1=100
set.seed(123)
seed1<- runif(100, min=20, max=345689)
for (I in 1:100){
set.seed(seed1[i]
code
set.seed(seed1[i]
}
Or use set.seed(NULL) between the models.
You will need some variable to store the relevant results from each model, and some code do display the results. In the former I suggest setting up a matrix or two that can be indexed using the for loop index.
Tim
From: Neha gupta <neha.bologna90 at gmail.com>
Sent: Tuesday, March 22, 2022 12:03 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org
Subject: Re: How important is set.seed
[External Email]
Thank you again Tim
d=readARFF("my data")
set.seed(123)
tr <- d[index, ]
ts <- d[-index, ]
ctrl <- trainControl(method = "repeatedcv",number=10)
set.seed(123)
ran_search <- train(lneff ~ ., data = tr,
method = "mlp",
tuneLength = 30,
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = ctrl)
getTrainPerf(ran_search)
Would it be good?
On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
My inclination is to follow Jeff?s advice and put it at the beginning of the program. You can always experiment: set.seed(42) rnorm(5,5,5) rnorm(5,5,5) runif(5,0,3) As long as the commands are executed in the order they are written, then the outcome is the same every time. Set seed is giving you reproducible outcomes. However, the second rnorm() does not give you the same outcome as the first. So set seed starts at the same point but if you want the first and second rnorm() call to give the same results you will need another set.seed(42). Note also, that it does not matter if you pause: run the above code as a chunk, or run each command individually you get the same result (as long as you do it in the sequence written). So, if you set seed, run some code, take a break, come back write some more code you might get in trouble because R is still using the original set.seed() command. To solve this issue use set.seed(Sys.time()) Or set.seed(NULL) Some of this is just good programming style workflow: Import data Declare variables and constants (set.seed() typically goes here) Define functions Body of code Generate output Clean up ( set.seed(NULL) would go here, along with removing unused variables and such) Regards, Tim From: Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> Sent: Tuesday, March 22, 2022 10:48 AM To: Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> Cc: Jeff Newmiller <jdnewmil at dcn.davis.ca.us<mailto:jdnewmil at dcn.davis.ca.us>>; r-help at r-project.org<mailto:r-help at r-project.org> Subject: Re: How important is set.seed [External Email] Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim From: Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> Sent: Tuesday, March 22, 2022 6:33 AM To: Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> Cc: Jeff Newmiller <jdnewmil at dcn.davis.ca.us<mailto:jdnewmil at dcn.davis.ca.us>>; r-help at r-project.org<mailto:r-help at r-project.org> Subject: Re: How important is set.seed [External Email] Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org<mailto:r-help at r-project.org>; Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>>; r-help mailing list <r-help at r-project.org<mailto:r-help at r-project.org>> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed.
On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
So step 1 is not to compare models, rather to understand how the choice of seed influences final model output. Once you have a handle on this issue, then work at comparing models. Tim From: Neha gupta <neha.bologna90 at gmail.com> Sent: Tuesday, March 22, 2022 12:19 PM To: Bert Gunter <bgunter.4567 at gmail.com> Cc: Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org Subject: Re: [R] How important is set.seed [External Email] I read a paper two days ago (and that's why I then posted here about set.seed) which used interpretable machine learning. According to the authors, different explanations (of the black-box models) will be produced by the ML models if different seeds are used or never used.
On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>> wrote:
OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us<mailto:jdnewmil at dcn.davis.ca.us>>; r-help at r-project.org<mailto:r-help at r-project.org> *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org<mailto:r-help at r-project.org>; Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>>; r-help mailing list <r-help at r-project.org<mailto:r-help at r-project.org>> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint<https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=PQ6DQb4poGhoaIYvUOp1VjwHR_LLJ5Cf6ugqj9o6_q8&e=>. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=gQOur-Bj_IkQUQavZr9GRQWDI6FLMolie3oSJK0pC1w&e=> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=yuDFhe31-hTPEV6voKWLGaIpMKTCGzo2zYVhaCzHqlc&e=> and provide commented, minimal, self-contained, reproducible code.
Not wrong, just mostly different words. 1) I think of reproducible code as something for teaching or sharing. It can be useful in debugging if I want help (one reason for sharing). In solo debugging my code, I have not used set.seed() -- at least not yet. However, my programs are all small, mostly less than 100 lines of code. 2) Agreed. 3) Agreed -- one needs to be very clear on why one is using set seed(). In many situations it is undoing the purpose of using a random number generator. 4) Agreed -- this is why it is so important to publish the version of R and the package used when presenting results. A great deal of effort has gone into building and selecting a good RNG. Depending on how the RNG is used, a basic understanding of what defines "good" is valuable. If there are huge numbers of calls to the RNG then periodicity in the RNG may start making a difference. Random.org might be another place for the OP to explore. Tim -----Original Message----- From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Tuesday, March 22, 2022 12:12 PM To: Neha gupta <neha.bologna90 at gmail.com> Cc: Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org Subject: Re: [R] How important is set.seed [External Email] OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_m ailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVe AsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2 jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject .org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kV eAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt 2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg& r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0r j5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=RA3jbebfWP_jAtB4a6543HFsPfG3Tl8cTn03TrDJZMeOH8G7S6ws8olwiMKccCkt&s=F-ZDg4sYpidt7qOt5ikZ_N8hvKD2QqnQ7KFUYEcyI0k&e= . com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail man_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs Rzsn7AkP-g&m=RA3jbebfWP_jAtB4a6543HFsPfG3Tl8cTn03TrDJZMeOH8G7S6ws8olwi MKccCkt&s=TS_4TMUnIWCeWX45h32k6ye0EgS5gRfudlmC0UlUCcs&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or g_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA sRzsn7AkP-g&m=RA3jbebfWP_jAtB4a6543HFsPfG3Tl8cTn03TrDJZMeOH8G7S6ws8olw iMKccCkt&s=-89dwL44gxINLqIPnPtRjXdBpJi4YSOhH1v4_mI1frQ&e= and provide commented, minimal, self-contained, reproducible code.
"rather to understand how the choice of seed influences final model output." No! Different seeds just produce different streams of (pseudo)-random numbers. Hence there cannot be any "understanding" of how "choice of seed" influences results. Presumably, what you meant is to characterize the variability in results from the procedure due to its incorporation of randomness in what it does. Re-read Jeff's last post. This does *not* require set.seed() at all. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 9:55 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
So step 1 is not to compare models, rather to understand how the choice of seed influences final model output. Once you have a handle on this issue, then work at comparing models. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 12:19 PM *To:* Bert Gunter <bgunter.4567 at gmail.com> *Cc:* Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org *Subject:* Re: [R] How important is set.seed *[External Email]* I read a paper two days ago (and that's why I then posted here about set.seed) which used interpretable machine learning. According to the authors, different explanations (of the black-box models) will be produced by the ML models if different seeds are used or never used. On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret
library.
Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment
in
an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am
not
quite sure how many seeds would constitute a good sample. For me that
would
depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this,
it's
recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu>
wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a
standard
analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you
selected
to get a particular result. However, in cases where you need a
reproducible
example, debugging a program, or specific other cases where you might
need
the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff
Newmiller
Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>;
r-help
mailing list <r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use
set.seed.
However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing
code,
not results. It is your responsibility to not let this become a
crutch... a
randomized simulation that is actually sensitive to the seed is
unlikely to
offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of
your
parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com
wrote:
Hello everyone I want to know (1) In which cases, we need to use set.seed while building ML models? (2) Which is the exact location we need to put the set.seed function
i.e.
when we split data into train/test sets, or just before we train a
model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
PLEASE do read the posting guide
_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_
AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy
RxpXsq4Y3TRMU&e=
PLEASE do read the posting guide https://urldefense.proofpoint
<https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=PQ6DQb4poGhoaIYvUOp1VjwHR_LLJ5Cf6ugqj9o6_q8&e=> .
com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=gQOur-Bj_IkQUQavZr9GRQWDI6FLMolie3oSJK0pC1w&e=>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=yuDFhe31-hTPEV6voKWLGaIpMKTCGzo2zYVhaCzHqlc&e=> and provide commented, minimal, self-contained, reproducible code.
I would also disagree with your rephrasing. What is the point in characterizing if there is no understanding? What one wants is to understand the variability in outcome caused by including a random element in the model if the focus is on the random numbers. It may also be that one wants to understand the variability in outcome if one were to repeat an experiment. One approach is to split a dataset into testing and training sets, and use the RNG to decide which observation goes into which set. However, every run will give a slightly different answer. The random number generator is then used in place of a permutation test where the number of permutations is too large for current computational effort. I assume what the OP was asking is whether the conclusion(s) of two (or more) models were the same given the range in outcomes produced by the random number generator(s). The only way to address this is to characterize the distribution of model outcomes from different runs with different random seeds. Examine that characterization and hope for understanding. Tim From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Tuesday, March 22, 2022 2:03 PM To: Ebert,Timothy Aaron <tebert at ufl.edu> Cc: Neha gupta <neha.bologna90 at gmail.com>; r-help at r-project.org Subject: Re: [R] How important is set.seed [External Email] "rather to understand how the choice of seed influences final model output." No! Different seeds just produce different streams of (pseudo)-random numbers. Hence there cannot be any "understanding" of how "choice of seed" influences results. Presumably, what you meant is to characterize the variability in results from the procedure due to its incorporation of randomness in what it does. Re-read Jeff's last post. This does *not* require set.seed() at all. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 9:55 AM Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
So step 1 is not to compare models, rather to understand how the choice of seed influences final model output. Once you have a handle on this issue, then work at comparing models. Tim From: Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> Sent: Tuesday, March 22, 2022 12:19 PM To: Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>> Cc: Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>>; r-help at r-project.org<mailto:r-help at r-project.org> Subject: Re: [R] How important is set.seed [External Email] I read a paper two days ago (and that's why I then posted here about set.seed) which used interpretable machine learning. According to the authors, different explanations (of the black-box models) will be produced by the ML models if different seeds are used or never used.
On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>> wrote:
OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote:
Ah, so maybe what you need is to think of ?set.seed()? as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim *From:* Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> *Sent:* Tuesday, March 22, 2022 6:33 AM *To:* Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us<mailto:jdnewmil at dcn.davis.ca.us>>; r-help at r-project.org<mailto:r-help at r-project.org> *Subject:* Re: How important is set.seed *[External Email]* Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu<mailto:tebert at ufl.edu>> wrote: If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>> On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help at r-project.org<mailto:r-help at r-project.org>; Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>>; r-help mailing list <r-help at r-project.org<mailto:r-help at r-project.org>> Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com<mailto:neha.bologna90 at gmail.com>> wrote:
Hello everyone
I want to know
(1) In which cases, we need to use set.seed while building ML models?
(2) Which is the exact location we need to put the set.seed function i.e.
when we split data into train/test sets, or just before we train a model?
Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf 0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org _posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat. ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r= 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_ AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy RxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint<https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=PQ6DQb4poGhoaIYvUOp1VjwHR_LLJ5Cf6ugqj9o6_q8&e=>. com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide. html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m= s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=gQOur-Bj_IkQUQavZr9GRQWDI6FLMolie3oSJK0pC1w&e=> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=yuDFhe31-hTPEV6voKWLGaIpMKTCGzo2zYVhaCzHqlc&e=> and provide commented, minimal, self-contained, reproducible code.