Dear All I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in (different) R programs. All three vectors have one column of 60 data points. I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. The first returns two values, an intercept and a slope, but the second returns 60 values. I suspect there is something in the "type" of X2 such that it forces the regression to do something different, but I can't work this out. Please help! Lewis ********************************************************************** Hermes Fund Managers Limited Registered in England No. 1661776, Lloyds Chambers, 1 Portsoken Street, London E1 8HZ *** Please read the Hermes email disclaimer at http://www.hermes.co.uk/email_terms.htm before acting on this email or opening any attachment *** The contents of this email are confidential. If you have received this message in error, please delete it immediately and contact the sender directly or the Hermes IT Helpdesk on +44(0)20 7680 2117. Any reliance on, use, disclosure, dissemination, distribution or copying of this email is unauthorised and strictly prohibited. This message has been checked for viruses but the recipient is strongly advised to rescan the message before opening any attachments or attached executable files. Hermes do not accept any liability for any damage sustained as a result of a virus introduced by this email or any attachment. ********************************************************************** ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email
Regression and data types
5 messages · GRANT Lewis, PIKAL Petr, Daniel Malter +2 more
Hi r-help-bounces at r-project.org napsal dne 26.09.2008 14:17:59:
Dear All I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in (different) R programs. All three vectors have one column of 60 data points. I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. The first returns two values, an intercept and a slope, but the second returns 60 values. I suspect there is something in the "type" of X2 such that it forces the regression to do something different, but I can't work this out.
try str(X2) it is probably character vector or factor, so you need to transfer it to numeric. see ?as.numeric and beware of factor properties (if X2 is factor) Regards Petr
Please help! Lewis ********************************************************************** Hermes Fund Managers Limited Registered in England No. 1661776, Lloyds Chambers, 1 Portsoken Street,
London E1 8HZ
*** Please read the Hermes email disclaimer at
http://www.hermes.co.uk/email_terms.htm
before acting on this email or opening any attachment *** The contents of this email are confidential. If you have received this message in error, please delete it immediately and contact the sender
directly
or the Hermes IT Helpdesk on +44(0)20 7680 2117. Any reliance on, use, disclosure, dissemination, distribution or copying of this email is unauthorised and strictly prohibited. This message has been checked for viruses but the recipient is strongly advised to rescan the message before opening any attachments or attached
executable files. Hermes do not accept any liability for any damage
sustained
as a result of a virus introduced by this email or any attachment. **********************************************************************
______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
check is(X1) is(X2) to investigate whether the types of the variables are equal. Probably (most probably), X2 is a factor (treated as dummy variables) so that your 60 values from the second regression are the intercept (i.e. the coefficient for the first observation) plus 59 dummies for the offset difference between each of the 59 observations and the first observation. run: summary(lm(Y~X1)) summary(lm(Y~X2)) and you will see from the regression output that the second regression is estimated with dummies for X2 rather than treating X2 as a numeric variable. Transform X2 to numeric by: X3=as.numeric(X2) and check whether the values are otherwise equal to X2. cbind(X2,X3) If that's the case, rerun your analysis summary(lm(Y~X3)) and you will only get one intercept and a slope coefficient. Cheers, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von GRANT Lewis Gesendet: Friday, September 26, 2008 8:18 AM An: r-help at r-project.org Betreff: [R] Regression and data types Dear All I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in (different) R programs. All three vectors have one column of 60 data points. I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. The first returns two values, an intercept and a slope, but the second returns 60 values. I suspect there is something in the "type" of X2 such that it forces the regression to do something different, but I can't work this out. Please help! Lewis ********************************************************************** Hermes Fund Managers Limited Registered in England No. 1661776, Lloyds Chambers, 1 Portsoken Street, London E1 8HZ *** Please read the Hermes email disclaimer at http://www.hermes.co.uk/email_terms.htm before acting on this email or opening any attachment *** The contents of this email are confidential. If you hav...{{dropped:28}}
On Fri, 2008-09-26 at 13:17 +0100, GRANT Lewis wrote:
Dear All I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in (different) R programs. All three vectors have one column of 60 data points. I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef.
Others have replied with an answer to your question. I just wanted to suggest you don't rummage around in R model objects taking what you like using '$'. 9 times out of 10 you'll get what you want, but that last remaining time will have you rubbing your head in confusion at best, at worst, answers may be just plain wrong. Use extractor functions, such as coef() instead: coef(lm(Y ~ X1)) G
The first returns two values, an intercept and a slope, but the second returns 60 values. I suspect there is something in the "type" of X2 such that it forces the regression to do something different, but I can't work this out. Please help! Lewis ********************************************************************** Hermes Fund Managers Limited Registered in England No. 1661776, Lloyds Chambers, 1 Portsoken Street, London E1 8HZ *** Please read the Hermes email disclaimer at http://www.hermes.co.uk/email_terms.htm before acting on this email or opening any attachment *** The contents of this email are confidential. If you have received this message in error, please delete it immediately and contact the sender directly or the Hermes IT Helpdesk on +44(0)20 7680 2117. Any reliance on, use, disclosure, dissemination, distribution or copying of this email is unauthorised and strictly prohibited. This message has been checked for viruses but the recipient is strongly advised to rescan the message before opening any attachments or attached executable files. Hermes do not accept any liability for any damage sustained as a result of a virus introduced by this email or any attachment. **********************************************************************
______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 27-Sep-08 15:18:32, Gavin Simpson wrote:
On Fri, 2008-09-26 at 13:17 +0100, GRANT Lewis wrote:
Dear All I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in (different) R programs. All three vectors have one column of 60 data points. I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef.
Others have replied with an answer to your question. I just wanted to suggest you don't rummage around in R model objects taking what you like using '$'. 9 times out of 10 you'll get what you want, but that last remaining time will have you rubbing your head in confusion at best, at worst, answers may be just plain wrong. Use extractor functions, such as coef() instead: coef(lm(Y ~ X1)) G
Surely, if you read (for example) in ?lm that:
Value:
[...]
An object of class '"lm"' is a list containing at least the
following components:
coefficients: a named vector of coefficients
[...]
then (subject to using enough of the name to give a unique partial
matching, as is the case here) you should find that
lm(...)$coef
returns what ?lm says!
Even with more complex model fitting, such as lme (in 'nlme') you
should still get what ?lme --> ?lmeObject says:
coefficients: a list with two components, 'fixed' and 'random', where
the first is a vector containing the estimated fixed effects
and the second is a list of matrices with the estimated
random effects for each level of grouping. For each matrix in
the 'random' list, the columns refer to the random effects
and the rows to the groups.
Since coef() is a gneric function, you might expect coef(lme(...))
to give the same; or even perhaps give a more easily interpreted
presentation. But there's nothing wrong with lme(...)$coef provided
you're fully aware of what it is (as explained in the 'help')..
BUT: In the case of lme, if you run the example:
fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
then fm1$coef really does return a list with components:
$fixed
(Intercept) age
16.7611111 0.6601852
$random
$random$Subject
(Intercept) age
M16 -0.1877576 -0.068853674
[...]
F04 1.0691628 -0.029862143
F11 1.2176440 0.083191188
whereas if you do coef(fm1) you will get only:
(Intercept) age
M16 16.57335 0.5913315
[...]
F04 17.83027 0.6303230
F11 17.97876 0.7433764
so (a) not only do you NOT get the "fixed effects" part,
but (b) what you might interpret as the "random effects" part
has very different values from the "$random" component of
fm1$coef! Well, there must be some explanation for this, mustn't
there? But, in ?lmeObject, I find ONLY:
Objects of this class have methods for the generic functions
'anova', 'coef', [...]
so I can't find out from there what coef(fm1) does; and if I look
in ?lme again I find only:
The functions 'resid', 'coef',
'fitted', 'fixed.effects', and 'random.effects' can be used to
extract some of its components.
so that doesn't tell me much (and certainly not why I get the
different results). And ?coef doesn't say anything specific either.
That being the case, I would, myself, stick with fm1$coef.
At least, then I know what I'm getting. With coef(fm1), I don't.
Of course, summary(fm1) is informative in its own way; but that's a
different matter; my point is that coef() doesn't necessarily give
you what you might expect, while (...)$coef does -- since it is
explained in full in ?lmeObject. This is, in fact, the opposite
conclusion to that drawn by Gavin!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Sep-08 Time: 18:09:54
------------------------------ XFMail ------------------------------