Dear friends, Hope you are doing great. I want to fit a logistic regression in R, where the dependent variable is the covid status (I used 1 for covid positives, and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s and 0s, or code it as yes/no and then make it a factor? Any guidance will be greatly appreciated, Best regards, Paul
Dependent Variable in Logistic Regression
17 messages · Rich Shepard, Paul Bernal, Patrick (Malone Quantitative) +5 more
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Dear friends,
Hope you are doing great. I want to fit a logistic regression in R, where
the dependent variable is the covid status (I used 1 for covid positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.
What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Sat, 1 Aug 2020, Paul Bernal wrote:
Hope you are doing great. I want to fit a logistic regression in R, where the dependent variable is the covid status (I used 1 for covid positives, and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s and 0s, or code it as yes/no and then make it a factor?
Paul, 1 or 0 are equivalent to yes or no, success or failure. All are nomminal variables so all should be factors, regardless of the coding. Rich
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com>
wrote:
Dear friends,
Hope you are doing great. I want to fit a logistic regression in R, where
the dependent variable is the covid status (I used 1 for covid positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.
What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
You appear to be confusing a binomial **response** with categorical "dependent variables." glm() of course fits continuous or categorical dependent variables. If a continuous dependent variable has only 2 values, the results for glm() will be the same whether or not it is considered to be continuous or categorical, though you may not recognize it as such. This discussion has already wandered off topic to statistical issues. I will not comment further on or off list. I suggest you consult a good reference on linear/generalized linear models or talk with a local statistician. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 11:04 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com>
wrote:
Dear friends,
Hope you are doing great. I want to fit a logistic regression in R, where
the dependent variable is the covid status (I used 1 for covid positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.
What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sorry, typo.My first sentences should read: "You appear to be confusing a binomial **response** with categorical "independent variables." glm() of course fits continuous or categorical independent variables." Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 11:11 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
You appear to be confusing a binomial **response** with categorical "dependent variables." glm() of course fits continuous or categorical dependent variables. If a continuous dependent variable has only 2 values, the results for glm() will be the same whether or not it is considered to be continuous or categorical, though you may not recognize it as such. This discussion has already wandered off topic to statistical issues. I will not comment further on or off list. I suggest you consult a good reference on linear/generalized linear models or talk with a local statistician. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 11:04 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com>
wrote:
Dear friends,
Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.
What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
... and further: " If a continuous independent variable has only 2 values,..." Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 11:11 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
You appear to be confusing a binomial **response** with categorical "dependent variables." glm() of course fits continuous or categorical dependent variables. If a continuous dependent variable has only 2 values, the results for glm() will be the same whether or not it is considered to be continuous or categorical, though you may not recognize it as such. This discussion has already wandered off topic to statistical issues. I will not comment further on or off list. I suggest you consult a good reference on linear/generalized linear models or talk with a local statistician. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 11:04 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com>
wrote:
Dear friends,
Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.
What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
No, R does not. glm() does in order to do logistic regression.
On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.
What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Patrick S. Malone, Ph.D., Malone Quantitative NEW Service Models: http://malonequantitative.com He/Him/His [[alternative HTML version deleted]]
... yes, but so does lm() for a categorical **INdependent** variable with more than 2 numerically labeled levels. n levels = (n-1) df for a categorical covariate, but 1 for a continuous one (unless more complex models are explicitly specified of course). As I said, the OP seems confused about whether he is referring to the response or covariates. Or maybe he just made the same typo I did. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
malone at malonequantitative.com> wrote:
No, R does not. glm() does in order to do logistic regression. On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s
and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Patrick S. Malone, Ph.D., Malone Quantitative NEW Service Models: http://malonequantitative.com He/Him/His
Dear friend, I am aware that I have a binomial dependent variable, which is covid status (1 if covid positive, and 0 otherwise). My question was if R requires to turn a binomial response variable into a factor or not, that's all. Cheers, Paul El s?b., 1 de agosto de 2020 1:22 p. m., Bert Gunter <bgunter.4567 at gmail.com> escribi?:
... yes, but so does lm() for a categorical **INdependent** variable with more than 2 numerically labeled levels. n levels = (n-1) df for a categorical covariate, but 1 for a continuous one (unless more complex models are explicitly specified of course). As I said, the OP seems confused about whether he is referring to the response or covariates. Or maybe he just made the same typo I did. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) < malone at malonequantitative.com> wrote:
No, R does not. glm() does in order to do logistic regression. On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s
and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Patrick S. Malone, Ph.D., Malone Quantitative NEW Service Models: http://malonequantitative.com He/Him/His
I didn't mean to imply that was the only time that it was required, only that it's not universal in R.
On Sat, Aug 1, 2020 at 2:22 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
... yes, but so does lm() for a categorical **INdependent** variable with more than 2 numerically labeled levels. n levels = (n-1) df for a categorical covariate, but 1 for a continuous one (unless more complex models are explicitly specified of course). As I said, the OP seems confused about whether he is referring to the response or covariates. Or maybe he just made the same typo I did. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) < malone at malonequantitative.com> wrote:
No, R does not. glm() does in order to do logistic regression. On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 at gmail.com> wrote:
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:
x <- factor(0:1)
x <- factor("yes","no")
will produce identical results up to labeling.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> wrote:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s
and
0s, or code it as yes/no and then make it a factor?
Any guidance will be greatly appreciated,
Best regards,
Paul
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Patrick S. Malone, Ph.D., Malone Quantitative NEW Service Models: http://malonequantitative.com He/Him/His
Patrick S. Malone, Ph.D., Malone Quantitative NEW Service Models: http://malonequantitative.com He/Him/His [[alternative HTML version deleted]]
Hello,
From the documentation, help('glm'):
Details
A typical predictor has the form|response ~ terms|where|response|is the
(numeric) response vector and|terms|is a series of terms which specifies
a linear predictor for|response|.
For|binomial|and|quasibinomial|families the response can also be
specified as a|factor
<http://127.0.0.1:11611/library/stats/help/factor>|(when the first level
denotes failure and all others success) or as a two-column matrix with
the columns giving the numbers of successes and failures. A terms
specification of the form|first + second|indicates all the terms
in|first|together with all the terms in|second|with any duplicates removed.
There is no need for the response to be a factor, it is optional, the
wording is very clear,
"For|binomial|and|quasibinomial|families the response *can* also be
specified as a|factor <http://127.0.0.1:11611/library/stats/help/factor>"|
And with binary, numeric responses I cannot reproduce the warning
message, the models fit silently.
Hope this helps,
Rui Barradas
?s 18:39 de 01/08/2020, Paul Bernal escreveu:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R, where the dependent variable is the covid status (I used 1 for covid positives, and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s and 0s, or code it as yes/no and then make it a factor? Any guidance will be greatly appreciated, Best regards, Paul [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast. https://www.avast.com/antivirus
On 2/08/20 5:39 am, Paul Bernal wrote:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R, where the dependent variable is the covid status (I used 1 for covid positives, and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s and 0s, or code it as yes/no and then make it a factor? Any guidance will be greatly appreciated,
There have been many responses to this post, the majority of them being
confusing and off the point.
BOTTOM LINE: R/glm() does *NOT* complain that one "should make the
dependent variable a factor". This is bovine faecal output.
As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a
Bernoulli model using glm(), your response/dependent variable is allowed
to be
* a numeric variable with values 0 or 1
* a logical variable
* a factor with two levels
The OP presumably fed glm() a *character* vector with values "0" and
"1". Doing *this* will cause glm() to whinge.
I reiterate: RTFM!!! (And perhaps learn to distinguish between
character vectors and factors.)
cheers,
Rolf Turner
Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
That's a bit harsh.
Isn't the best advice here, to post a reproducible example...
Which I believe has been mentioned.
Also, I'd strongly encourage people to use package+function name, for
this sort of thing.
stats::glm
As there are many R functions for GLMs...
On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner <r.turner at auckland.ac.nz> wrote:
On 2/08/20 5:39 am, Paul Bernal wrote:
Dear friends, Hope you are doing great. I want to fit a logistic regression in R, where the dependent variable is the covid status (I used 1 for covid positives, and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s and 0s, or code it as yes/no and then make it a factor? Any guidance will be greatly appreciated,
There have been many responses to this post, the majority of them being
confusing and off the point.
BOTTOM LINE: R/glm() does *NOT* complain that one "should make the
dependent variable a factor". This is bovine faecal output.
As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a
Bernoulli model using glm(), your response/dependent variable is allowed
to be
* a numeric variable with values 0 or 1
* a logical variable
* a factor with two levels
The OP presumably fed glm() a *character* vector with values "0" and
"1". Doing *this* will cause glm() to whinge.
I reiterate: RTFM!!! (And perhaps learn to distinguish between
character vectors and factors.)
cheers,
Rolf Turner
--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
Abby Spurdle
on Sun, 2 Aug 2020 15:13:51 +1200 writes:
> That's a bit harsh. Isn't the best advice here, to post a
> reproducible example... Which I believe has been
> mentioned.
> Also, I'd strongly encourage people to use
> package+function name, for this sort of thing.
> stats::glm
> As there are many R functions for GLMs...
Sorry, Abby, I do disagree here ((strongly enough as to warrant
this reply) :
We're talking about doing "basic" statistics with R, and these
function in the stats package have been part of R even before
got a version number.
So, no, glm() {and the stats package} are the default and I still
think everybody should know and assume that.
Martin
> On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner
> <r.turner at auckland.ac.nz> wrote:
>>
>>
>> On 2/08/20 5:39 am, Paul Bernal wrote:
>>
>> > Dear friends,
>> >
>> > Hope you are doing great. I want to fit a logistic
>> regression in R, where > the dependent variable is the
>> covid status (I used 1 for covid positives, > and 0 for
>> covid negatives), but when I ran the glm, R complains
>> that I > should make the dependent variable a factor.
>> >
>> > What would be more advisable, to keep the dependent
>> variable with 1s and > 0s, or code it as yes/no and then
>> make it a factor?
>> >
>> > Any guidance will be greatly appreciated,
>>
>>
>> There have been many responses to this post, the majority
>> of them being confusing and off the point.
>>
>> BOTTOM LINE: R/glm() does *NOT* complain that one "should
>> make the dependent variable a factor". This is bovine
>> faecal output.
>>
>> As Rui Barradas has pointed out (alternatively: RTFM!)
>> when you fit a Bernoulli model using glm(), your
>> response/dependent variable is allowed to be
>>
>> * a numeric variable with values 0 or 1 * a logical
>> variable * a factor with two levels
>>
>> The OP presumably fed glm() a *character* vector with
>> values "0" and "1". Doing *this* will cause glm() to
>> whinge.
>>
>> I reiterate: RTFM!!! (And perhaps learn to distinguish
>> between character vectors and factors.)
>>
>> cheers,
>>
>> Rolf Turner
>>
>> --
>> Honorary Research Fellow Department of Statistics
>> University of Auckland Phone: +64-9-373-7599 ext. 88276
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
Sorry, Abby, I do disagree here ((strongly enough as to warrant this reply) :
Which part are you disagreeing with? That unambiquous names/references should be used, or that there are many R functions for GLMs. The wording of your post, suggests (kind of), that there is only one R function for GLMs.
We're talking about doing "basic" statistics with R, and these function in the stats package have been part of R even before got a version number.
Remember, not everyone is using the same R packages, as you. And some people have done university courses, or done online courses, etc, in R, without ever using one function from the stats package. I'm reluctant to assume that all R users will have a common understanding. And what may seem obvious to you or me, may seem quite foreign to some users, or vice versa.
So, no, glm() {and the stats package} are the default and I still
think everybody should know and assume that.
But perhaps most importantly, the OP said "the glm". He never said "glm()", but rather the subsequent posters did. Rolf suggested his post was bullshit, after removing the lexical peroxide. How does anyone know that it wasn't a genuine post, but in reference to something other than stats::glm? Shouldn't people be innocent until proven guilty. Otherwise (something I have been guilty of in the past), the mailing list turns into statistical propaganda... Even if the OP was referring to stats::glm, I'm still inclined to feel the post was legitimate, just a bit short on technical details...
All: Kindly take this offline please. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Aug 3, 2020 at 12:39 PM Abby Spurdle <spurdle.a at gmail.com> wrote:
Sorry, Abby, I do disagree here ((strongly enough as to warrant this reply) :
Which part are you disagreeing with? That unambiquous names/references should be used, or that there are many R functions for GLMs. The wording of your post, suggests (kind of), that there is only one R function for GLMs.
We're talking about doing "basic" statistics with R, and these function in the stats package have been part of R even before got a version number.
Remember, not everyone is using the same R packages, as you. And some people have done university courses, or done online courses, etc, in R, without ever using one function from the stats package. I'm reluctant to assume that all R users will have a common understanding. And what may seem obvious to you or me, may seem quite foreign to some users, or vice versa.
So, no, glm() {and the stats package} are the default and I still
think everybody should know and assume that.
But perhaps most importantly, the OP said "the glm". He never said "glm()", but rather the subsequent posters did. Rolf suggested his post was bullshit, after removing the lexical peroxide. How does anyone know that it wasn't a genuine post, but in reference to something other than stats::glm? Shouldn't people be innocent until proven guilty. Otherwise (something I have been guilty of in the past), the mailing list turns into statistical propaganda... Even if the OP was referring to stats::glm, I'm still inclined to feel the post was legitimate, just a bit short on technical details...
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.