logistic regression using "glm",which "y" is set to be "1" - R-help

Bin Yue

Wed, Dec 5, 2007 6:06 PM #

Dear friends :
    using the "glm" function and setting family=binomial, I got a list of
coefficients.
The coefficients reflect the effects  of predicted variables on the
probability of the response to be "1".
My response variable consists of  "A" and "D" . I don't know which level of
the response was set to be 1.
is the first element of the response set to be 1?
   Thank all in advance.
   Regards,

-----
Best regards,
Bin Yue

*************
student for a Master program in South Botanical Garden , CAS

View this message in context: http://www.nabble.com/logistic-regression-using-%22glm%22%2Cwhich-%22y%22-is-set-to-be-%221%22-tf4953617.html#a14185060
Sent from the R help mailing list archive at Nabble.com.

Marc Schwartz

Wed, Dec 5, 2007 6:47 PM #

On Wed, 2007-12-05 at 18:06 -0800, Bin Yue wrote:

As per the Details section of ?glm:

For binomial and quasibinomial families the response can also be
specified as a factor (when the first level denotes failure and all
others success) ...


So use:

  levels(response.variable)

and that will give you the factor levels, where the first level is 0 and
the second level is 1. 

If you work in a typical English based locale with default alpha based
level ordering, it will likely be A (Alive?) is 0 and D (Dead?) is 1.

HTH,

Marc Schwartz

Bin Yue

Wed, Dec 5, 2007 7:41 PM #

Dear Marc Schwartz:
 When I ask R2.6.0 for windows, the information it gives does not contain
much about family=binomial .
 You said that there is a detail section of "?glm". I want to read it
thoroughly. Could  you tell me where and how I can find the detail section
of "?glm".
   Thank you very much .
   Best regards,
 Bin Yue

Marc Schwartz wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-----
Best regards,
Bin Yue

*************
student for a Master program in South Botanical Garden , CAS

View this message in context: http://www.nabble.com/logistic-regression-using-%22glm%22%2Cwhich-%22y%22-is-set-to-be-%221%22-tf4953617.html#a14185819
Sent from the R help mailing list archive at Nabble.com.

Weiwei Shi

Wed, Dec 5, 2007 7:54 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071205/4e5741a3/attachment.pl

Bin Yue

Wed, Dec 5, 2007 10:33 PM #

Dear all:
     By comparing glmresult$y and model.response(model.frame(glmresult)),  I
have found out which one is 
set to be "TRUE" and which "FALSE".But it seems that to fit a logistic
regression , logit (or logistic) transformation has to be done before
regression.
     Does anybody know how to obtain the transformation result ? It is hard
to settle down before knowing the actual process R works . I have read some
books and the "?glm" help file , but what they told me was not sufficient.
   Best wishes ,
 Bin Yue

Weiwei Shi wrote:

Dear Bin:
you type
?glm
in R console and you will find the Detail section of help file for glm

i pasted it for you too

Details

A typical predictor has the form response ~ terms where response is the
(numeric) response vector and terms is a series of terms which specifies a
linear predictor for response. For binomialand quasibinomial families the
response can also be specified as a
factor<file:///Library/Frameworks/R.framework/Versions/2.6/Resources/library/base/html/factor.html>
(when
the first level denotes failure and all others success) or as a two-column
matrix with the columns giving the numbers of successes and failures. A
terms specification of the form first + second indicates all the terms in
first together with all the terms in second with duplicates removed. The
terms in the formula will be re-ordered so that main effects come first,
followed by the interactions, all second-order, all third-order and so on:
to avoid this pass a terms object as the formula.

A specification of the form first:second indicates the the set of terms
obtained by taking the interactions of all terms in first with all terms
in
second. The specification first*second indicates the *cross* of first and
second. This is the same as first + second + first:second.

glm.fit is the workhorse function.

If more than one of etastart, start and mustart is specified, the first in
the list will be used. It is often advisable to supply starting values for
a
quasi<file:///Library/Frameworks/R.framework/Versions/2.6/Resources/library/stats/html/family.html>
family,
and also for families with unusual links such as gaussian("log").

All of weights, subset, offset, etastart and mustart are evaluated in the
same way as variables in formula, that is first in data and then in the
environment of formula.

On Dec 5, 2007 10:41 PM, Bin Yue <leffgh at 163.com> wrote:

Dear Marc Schwartz:
 When I ask R2.6.0 for windows, the information it gives does not contain
much about family=binomial .
 You said that there is a detail section of "?glm". I want to read it
thoroughly. Could  you tell me where and how I can find the detail
section
of "?glm".
  Thank you very much .
  Best regards,
 Bin Yue



Marc Schwartz wrote:


On Wed, 2007-12-05 at 18:06 -0800, Bin Yue wrote:

Dear friends :
    using the "glm" function and setting family=binomial, I got a list

of

coefficients.
The coefficients reflect the effects  of predicted variables on the
probability of the response to be "1".
My response variable consists of  "A" and "D" . I don't know which

level

of
the response was set to be 1.
is the first element of the response set to be 1?
   Thank all in advance.
   Regards,

-----
Best regards,
Bin Yue


As per the Details section of ?glm:

For binomial and quasibinomial families the response can also be
specified as a factor (when the first level denotes failure and all
others success) ...


So use:

  levels(response.variable)

and that will give you the factor levels, where the first level is 0

and

the second level is 1.

If you work in a typical English based locale with default alpha based
level ordering, it will likely be A (Alive?) is 0 and D (Dead?) is 1.

HTH,

Marc Schwartz

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-----
Best regards,
Bin Yue

*************
student for a Master program in South Botanical Garden , CAS

View this message in context: http://www.nabble.com/logistic-regression-using-%22glm%22%2Cwhich-%22y%22-is-set-to-be-%221%22-tf4953617.html#a14187112
Sent from the R help mailing list archive at Nabble.com.

Marc Schwartz

Thu, Dec 6, 2007 10:11 AM #

On Wed, 2007-12-05 at 22:33 -0800, Bin Yue wrote:

Bin,

I may be mis-interpreting your follow up query, but here goes:

You have presumably created a logistic regression model. The resultant
model object is called 'glmresult'.

If you use:

  fitted(glmresult)

it will return the fitted predicted values on a probability scale (0 -
1) for the original set of data that you used.

You can also use:

  predict(glmresult, type = "response")

The advantage of using predict.glm() is that you can apply the model
against new data.


If you want the linear predicted values on a log-odds scale, you can
use:

  glmresult$linear.predictors

or more easily:

  predict(glmresult)

See ?fitted and ?predict.glm for more information.


Let's use an example from ?infert:

model1 <- glm(case ~ spontaneous+induced, data=infert,family=binomial())

# Summary of fitted values on a probability scale

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1534  0.1534  0.2949  0.3347  0.3750  0.7511 


# Same

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1534  0.1534  0.2949  0.3347  0.3750  0.7511 


# Get log-odds scale values

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.7080 -1.7080 -0.8716 -0.7781 -0.5107  1.1050


# Same

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.7080 -1.7080 -0.8716 -0.7781 -0.5107  1.1050


If we wanted to do the log-odds scale to probability scale transform
manually, we could do:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1534  0.1534  0.2949  0.3347  0.3750  0.7511 

Look familiar?

I would urge you to read through An Introduction To R, which is
available with your R installation or via the R web site under
Documentation. In addition, there are various books listed on the R web
site regarding model building and related subject matter. Which you
choose can be a matter of taste, but two I recommend would be:

William N. Venables and Brian D. Ripley. Modern Applied Statistics with
S. Fourth Edition. Springer, New York, 2002. ISBN 0-387-95457-0

Frank E. Harrell. Regression Modeling Strategies, with Applications to
Linear Models, Survival Analysis and Logistic Regression. Springer,
2001. ISBN 0-387-95232-2

HTH,

Marc Schwartz