Greetings,
I'm having a problem with something that I think is very simple - I'd
like to be able to use the 'sapply' and 'by' functions in 1 function
to be able (for example) to get regression coefficients from multiple
models by a grouping variable. I think that I'm missing something
that is probably obvious to experienced users.
Here's a simple (trivial) example of what I'd like to do:
new <- data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10))
fxa <- function(x,data) { lm(x~Pred,data=data)$coef }
sapply(new[,1:2],fxa,new) # this yields coefficients for the
predictor in separate models
fxb <- function(x) {lm(Outcome.1~Pred,da=x)$coef};
by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex
## I'd like to be able to combine 'sapply' and 'by' to be able to get
the regression coefficients for Outome.1 and Outcome.2 by each sex,
rather than running fxb a second time predicting 'Outcome.2' or by
subsetting the data - by sex - before I run the function, but the
following doesn't work -
by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new))
'Error in model.frame.default(formula = x ~ Pred, data = data,
drop.unused.levels = TRUE) :
variable lengths differ (found for 'Pred')'
##I understand the error message - the length of 'Pred' is 10 while
the length of each sex group is 5, but I'm not sure how to correctly
write the 'by' function to use 'sapply' inside it. Could someone
please point me in the right direction? Thanks very much in advance
David S Freedman, CDC (Atlanta USA) [definitely not the well-know
statistician, David A Freedman, in Berkeley]
Using 'sapply' and 'by' in one function
6 messages · David Freedman, Gabor Grothendieck, Hadley Wickham
By passing new to fxa via the second argument of fxa, new is not being subsetted hence the error. Try this: by(new, new$sex, function(x) sapply(x[1:2], function(y) coef(lm(y ~ Pred, x))) Actually, you can do the above without sapply as lm can take a matrix for the dependent variable: by(new, new$sex, function(x) coef(lm(as.matrix(x[1:2]) ~ Pred, x)))
On Feb 10, 2008 8:19 AM, David & Natalia <3.14david at gmail.com> wrote:
Greetings,
I'm having a problem with something that I think is very simple - I'd
like to be able to use the 'sapply' and 'by' functions in 1 function
to be able (for example) to get regression coefficients from multiple
models by a grouping variable. I think that I'm missing something
that is probably obvious to experienced users.
Here's a simple (trivial) example of what I'd like to do:
new <- data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10))
fxa <- function(x,data) { lm(x~Pred,data=data)$coef }
sapply(new[,1:2],fxa,new) # this yields coefficients for the
predictor in separate models
fxb <- function(x) {lm(Outcome.1~Pred,da=x)$coef};
by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex
## I'd like to be able to combine 'sapply' and 'by' to be able to get
the regression coefficients for Outome.1 and Outcome.2 by each sex,
rather than running fxb a second time predicting 'Outcome.2' or by
subsetting the data - by sex - before I run the function, but the
following doesn't work -
by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new))
'Error in model.frame.default(formula = x ~ Pred, data = data,
drop.unused.levels = TRUE) :
variable lengths differ (found for 'Pred')'
##I understand the error message - the length of 'Pred' is 10 while
the length of each sex group is 5, but I'm not sure how to correctly
write the 'by' function to use 'sapply' inside it. Could someone
please point me in the right direction? Thanks very much in advance
David S Freedman, CDC (Atlanta USA) [definitely not the well-know
statistician, David A Freedman, in Berkeley]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Actually thinking about this, not only do you not need sapply but you don't even need by: new2 <- transform(new, sex = factor(sex)) coef(lm(as.matrix(new2[1:2]) ~ sex/Pred - 1, new2))
On Feb 10, 2008 8:43 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
By passing new to fxa via the second argument of fxa, new is not being subsetted hence the error. Try this: by(new, new$sex, function(x) sapply(x[1:2], function(y) coef(lm(y ~ Pred, x))) Actually, you can do the above without sapply as lm can take a matrix for the dependent variable: by(new, new$sex, function(x) coef(lm(as.matrix(x[1:2]) ~ Pred, x))) On Feb 10, 2008 8:19 AM, David & Natalia <3.14david at gmail.com> wrote:
Greetings,
I'm having a problem with something that I think is very simple - I'd
like to be able to use the 'sapply' and 'by' functions in 1 function
to be able (for example) to get regression coefficients from multiple
models by a grouping variable. I think that I'm missing something
that is probably obvious to experienced users.
Here's a simple (trivial) example of what I'd like to do:
new <- data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10))
fxa <- function(x,data) { lm(x~Pred,data=data)$coef }
sapply(new[,1:2],fxa,new) # this yields coefficients for the
predictor in separate models
fxb <- function(x) {lm(Outcome.1~Pred,da=x)$coef};
by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex
## I'd like to be able to combine 'sapply' and 'by' to be able to get
the regression coefficients for Outome.1 and Outcome.2 by each sex,
rather than running fxb a second time predicting 'Outcome.2' or by
subsetting the data - by sex - before I run the function, but the
following doesn't work -
by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new))
'Error in model.frame.default(formula = x ~ Pred, data = data,
drop.unused.levels = TRUE) :
variable lengths differ (found for 'Pred')'
##I understand the error message - the length of 'Pred' is 10 while
the length of each sex group is 5, but I'm not sure how to correctly
write the 'by' function to use 'sapply' inside it. Could someone
please point me in the right direction? Thanks very much in advance
David S Freedman, CDC (Atlanta USA) [definitely not the well-know
statistician, David A Freedman, in Berkeley]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Feb 10, 2008 8:25 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Actually thinking about this, not only do you not need sapply but you don't even need by: new2 <- transform(new, sex = factor(sex)) coef(lm(as.matrix(new2[1:2]) ~ sex/Pred - 1, new2))
Although that's a very slightly different model, as it assumes that both sexes have the same error variance. Hadley
On Feb 10, 2008 10:12 AM, hadley wickham <h.wickham at gmail.com> wrote:
On Feb 10, 2008 8:25 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Actually thinking about this, not only do you not need sapply but you don't even need by: new2 <- transform(new, sex = factor(sex)) coef(lm(as.matrix(new2[1:2]) ~ sex/Pred - 1, new2))
Although that's a very slightly different model, as it assumes that both sexes have the same error variance.
But the output are the coefficients and they are identical.
Although that's a very slightly different model, as it assumes that both sexes have the same error variance.
But the output are the coefficients and they are identical.
For the sake of an example I'm sure that David simply omitted the part of his analysis where he looked at the standard errors as well ;) Hadley