Regression and data types - R-help

Fri, Sep 26, 2008 5:17 AM #

Dear All
I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were
generated in (different) R programs. All three vectors have one column
of 60 data points.
I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. The first returns
two values, an intercept and a slope, but the second returns 60 values.
I suspect there is something in the "type" of X2 such that it forces the
regression to do something different, but I can't work this out.
Please help!
Lewis

**********************************************************************
Hermes Fund Managers Limited
Registered in England No. 1661776, Lloyds Chambers, 1 Portsoken Street, London E1 8HZ

*** Please read the Hermes email disclaimer at http://www.hermes.co.uk/email_terms.htm before acting on this email or opening any attachment ***

The contents of this email are confidential. If you have received this message in error, please delete it immediately and contact the sender directly or the Hermes IT Helpdesk on +44(0)20 7680 2117. Any reliance on, use, disclosure, dissemination, distribution or copying of this email is unauthorised and strictly prohibited.

This message has been checked for viruses but the recipient is strongly advised to rescan the message before opening any attachments or attached executable files. Hermes do not accept any liability for any damage sustained as a result of a virus introduced by this email or any attachment.

**********************************************************************

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email

PIKAL Petr

Fri, Sep 26, 2008 7:29 AM #

Hi

r-help-bounces at r-project.org napsal dne 26.09.2008 14:17:59:

try

str(X2)

it is probably character vector or factor, so you need to transfer it to 
numeric.

see ?as.numeric and beware of factor properties (if X2 is factor)

Regards
Petr

London E1 8HZ

http://www.hermes.co.uk/email_terms.htm

directly

sustained

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Daniel Malter

Fri, Sep 26, 2008 9:55 AM #

check 

is(X1)
is(X2)

to investigate whether the types of the variables are equal. Probably (most
probably), X2 is a factor (treated as dummy variables) so that your 60
values from the second regression are the intercept (i.e. the coefficient
for the first observation) plus 59 dummies for the offset difference between
each of the 59 observations and the first observation.

run:

summary(lm(Y~X1))

summary(lm(Y~X2))

and you will see from the regression output that the second regression is
estimated with dummies for X2 rather than treating X2 as a numeric variable.
Transform X2 to numeric by:

X3=as.numeric(X2)

and check whether the values are otherwise equal to X2.

cbind(X2,X3)

If that's the case, rerun your analysis

summary(lm(Y~X3))

and you will only get one intercept and a slope coefficient.

Cheers,
Daniel


-------------------------
cuncta stricte discussurus
------------------------- 


-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von GRANT Lewis
Gesendet: Friday, September 26, 2008 8:18 AM
An: r-help at r-project.org
Betreff: [R] Regression and data types

Dear All
I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in
(different) R programs. All three vectors have one column of 60 data points.
I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. The first returns two
values, an intercept and a slope, but the second returns 60 values.
I suspect there is something in the "type" of X2 such that it forces the
regression to do something different, but I can't work this out.
Please help!
Lewis


**********************************************************************
Hermes Fund Managers Limited
Registered in England No. 1661776, Lloyds Chambers, 1 Portsoken Street,
London E1 8HZ

*** Please read the Hermes email disclaimer at
http://www.hermes.co.uk/email_terms.htm before acting on this email or
opening any attachment ***

The contents of this email are confidential.  If you hav...{{dropped:28}}

Gavin Simpson

Sat, Sep 27, 2008 8:18 AM #

On Fri, 2008-09-26 at 13:17 +0100, GRANT Lewis wrote:

Others have replied with an answer to your question. I just wanted to
suggest you don't rummage around in R model objects taking what you like
using '$'. 9 times out of 10 you'll get what you want, but that last
remaining time will have you rubbing your head in confusion at best, at
worst, answers may be just plain wrong.

Use extractor functions, such as coef() instead:

coef(lm(Y ~ X1))

G

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

(Ted Harding)

Sat, Sep 27, 2008 10:09 AM #

On 27-Sep-08 15:18:32, Gavin Simpson wrote:

Surely, if you read (for example) in ?lm that:

Value:
[...]
     An object of class '"lm"' is a list containing at least the
     following components:

     coefficients: a named vector of coefficients
     [...]

then (subject to using enough of the name to give a unique partial
matching, as is the case here) you should find that

  lm(...)$coef

returns what ?lm says!

Even with more complex model fitting, such as lme (in 'nlme') you
should still get what ?lme --> ?lmeObject says:

coefficients: a list with two components, 'fixed' and 'random', where
          the first is a vector containing the estimated fixed effects
          and the second is a list of matrices with the estimated
          random effects for each level of grouping. For each matrix in
          the 'random' list, the columns refer to the random effects
          and the rows to the groups.

Since coef() is a gneric function, you might expect coef(lme(...))
to give the same; or even perhaps give a more easily interpreted
presentation. But there's nothing wrong with lme(...)$coef provided
you're fully aware of what it is (as explained in the 'help')..

BUT: In the case of lme, if you run the example:

  fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age

then fm1$coef really does return a list with components:

  $fixed
  (Intercept)         age 
   16.7611111   0.6601852 

  $random
  $random$Subject
      (Intercept)          age
  M16  -0.1877576 -0.068853674
  [...]
  F04   1.0691628 -0.029862143
  F11   1.2176440  0.083191188

whereas if you do coef(fm1) you will get only:

      (Intercept)       age
  M16    16.57335 0.5913315
  [...]
  F04    17.83027 0.6303230
  F11    17.97876 0.7433764

so (a) not only do you NOT get the "fixed effects" part,
but (b) what you might interpret as the "random effects" part
has very different values from the "$random" component of
fm1$coef! Well, there must be some explanation for this, mustn't
there? But, in ?lmeObject, I find ONLY:

     Objects of this class have methods for the generic functions 
     'anova', 'coef', [...]

so I can't find out from there what coef(fm1) does; and if I look
in ?lme again I find only:

     The functions 'resid', 'coef',
     'fitted', 'fixed.effects', and 'random.effects'  can be used to
     extract some of its components.

so that doesn't tell me much (and certainly not why I get the
different results). And ?coef doesn't say anything specific either.

That being the case, I would, myself, stick with fm1$coef.
At least, then I know what I'm getting. With coef(fm1), I don't.

Of course, summary(fm1) is informative in its own way; but that's a
different matter; my point is that coef() doesn't necessarily give
you what you might expect, while (...)$coef does -- since it is
explained in full in ?lmeObject. This is, in fact, the opposite
conclusion to that drawn by Gavin!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Sep-08                                       Time: 18:09:54
------------------------------ XFMail ------------------------------