Skip to content

Using 'sapply' and 'by' in one function

6 messages · David Freedman, Gabor Grothendieck, Hadley Wickham

#
Greetings,

I'm having a problem with something that I think is very simple - I'd
like to be able to use the 'sapply' and 'by' functions in 1 function
to be able (for example) to get regression coefficients from multiple
models by a grouping variable.  I think that I'm missing something
that is probably obvious to experienced users.

Here's a simple (trivial) example of what I'd like to do:

new <- data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10))
fxa <- function(x,data)   { lm(x~Pred,data=data)$coef }
sapply(new[,1:2],fxa,new)  # this yields coefficients for the
predictor in separate models

fxb <- function(x)   {lm(Outcome.1~Pred,da=x)$coef};
by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex

## I'd like to be able to combine 'sapply' and 'by' to be able to get
the regression coefficients for Outome.1 and Outcome.2 by each sex,
rather than running fxb a second time predicting 'Outcome.2' or by
subsetting the data - by sex - before I run the function, but the
following doesn't work -

by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new))
'Error in model.frame.default(formula = x ~ Pred, data = data,
drop.unused.levels = TRUE) :
  variable lengths differ (found for 'Pred')'

##I understand the error message - the length of 'Pred' is 10 while
the length of each sex group is 5, but I'm not sure how to correctly
write the 'by' function to use 'sapply' inside it.   Could someone
please point me in the right direction?  Thanks very much in advance

David S Freedman, CDC (Atlanta USA) [definitely not the well-know
statistician, David A Freedman, in Berkeley]
#
By passing new to fxa via the second argument of fxa, new is not being
subsetted hence the error.  Try this:

by(new, new$sex, function(x) sapply(x[1:2], function(y) coef(lm(y ~ Pred, x)))

Actually, you can do the above without sapply as lm can take a matrix
for the dependent variable:

by(new, new$sex, function(x) coef(lm(as.matrix(x[1:2]) ~ Pred, x)))
On Feb 10, 2008 8:19 AM, David & Natalia <3.14david at gmail.com> wrote:
#
Actually thinking about this, not only do you not need sapply but you
don't even need by:

new2 <- transform(new, sex = factor(sex))
coef(lm(as.matrix(new2[1:2]) ~ sex/Pred - 1, new2))
On Feb 10, 2008 8:43 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
#
On Feb 10, 2008 8:25 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Although that's a very slightly different model, as it assumes that
both sexes have the same error variance.

Hadley
#
On Feb 10, 2008 10:12 AM, hadley wickham <h.wickham at gmail.com> wrote:
But the output are the coefficients and they are identical.
#
For the sake of an example I'm sure that David simply omitted the part
of his analysis where he looked at the standard errors as well ;)

Hadley