Skip to content

splitting into multiple dataframes and then create a loop to work

6 messages · Dimitris Rizopoulos, Dennis Murphy, Nilaya Sharma

#
Hi:

This is straightforward to do with the plyr package:

# install.packages('plyr')
library('plyr')
set.seed(1234)
df <- data.frame(clvar = rep(1:4, each = 10), yvar = rnorm(40, 10, 6),
                 var1 = rnorm(40, 10, 4), var2 = rnorm(40, 10, 4),
                 var3 = rnorm(40, 5, 2), var4 = rnorm(40, 10, 3),
                 var5 = rnorm(40, 15, 8))
mods <- dlply(df, .(clvar), function(d) lm(yvar ~ . - clvar, data = d))
summary(mods[[1]])

mods is a list of model objects, one per subgroup defined by clvar.
You can use extraction functions to pull out pieces from each model,
e.g.,

ldply(mods, function(m) summary(m)[['r.squared']])
ldply(mods, function(m) coef(m))
ldply(mods, function(m) resid(m))

The dlply() function reads a data frame as input and outputs to a
list; conversely, the ldply() function reads from a list and outputs
to a data frame. The functions you call inside have to be compatible
with the input and output data types.

HTH,
Dennis
On Mon, Aug 29, 2011 at 8:37 AM, Nilaya Sharma <nilaya.sharma at gmail.com> wrote:
#
You can do this using function lmList() from package nlme, without 
having to split the data frames, e.g.,

library(nlme)

mlis <- lmList(yvar ~ .  - clvar | clvar, data = df)
mlis
summary(mlis)


I hope it helps.

Best,
Dimitris
On 8/29/2011 5:37 PM, Nilaya Sharma wrote:

  
    
#
Hi:

Dimitris' solution is appropriate, but it needs to be mentioned that
the approach I offered earlier in this thread differs from the
lmList() approach. lmList() uses a pooled measure of error MSE (which
you can see at the bottom of the output from summary(mlis) ), whereas
the plyr approach subdivides the data into distinct sub-data frames
and analyzes them as separate entities. As a result, the residual MSEs
will differ between the two approaches, which in turn affects the
significance tests on the model coefficients. You need to decide which
approach is better for your purposes.

Cheers,
Dennis

On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
#
well, if a pooled estimate of the residual standard error is not 
desirable, then you just need to set argument 'pool' of lmList() to 
FALSE, e.g.,

mlis <- lmList(yvar ~ .  - clvar | clvar, data = df, pool = FALSE)
summary(mlis)


Best,
Dimitris
On 8/29/2011 9:20 PM, Dennis Murphy wrote: