splitting into multiple dataframes and then create a loop to work

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110829/c459d285/attachment.pl>
Hi:

This is straightforward to do with the plyr package:

# install.packages('plyr')
library('plyr')
set.seed(1234)
df <- data.frame(clvar = rep(1:4, each = 10), yvar = rnorm(40, 10, 6),
                 var1 = rnorm(40, 10, 4), var2 = rnorm(40, 10, 4),
                 var3 = rnorm(40, 5, 2), var4 = rnorm(40, 10, 3),
                 var5 = rnorm(40, 15, 8))
mods <- dlply(df, .(clvar), function(d) lm(yvar ~ . - clvar, data = d))
summary(mods[[1]])

mods is a list of model objects, one per subgroup defined by clvar.
You can use extraction functions to pull out pieces from each model,
e.g.,

ldply(mods, function(m) summary(m)[['r.squared']])
ldply(mods, function(m) coef(m))
ldply(mods, function(m) resid(m))

The dlply() function reads a data frame as input and outputs to a
list; conversely, the ldply() function reads from a list and outputs
to a data frame. The functions you call inside have to be compatible
with the input and output data types.

HTH,
Dennis
Dear All

Sorry for this simple question, I could not solve it by spending days.

My data looks like this:

# data
set.seed(1234)
clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar <- ?rnorm(40, 10,6);
var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <- rnorm(40, 5, 2);
var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example
df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5)

# manual splitting
df1 <- subset(df, clvar == 1)
df2 <- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)

# i tried to mechanize it
*

for(i in 1:5) {

? ? ? ? ?df[i] <- subset(df, clvar == i)

}

I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My limited
R knowledge, did not help at all !

*

# working on each of variable, just trying simple function
?a <- 3:8
out1 <- lapply(1:5, function(ind){
? ? ? ? ? ? ? ? ? lm(df1$yvar ~ df1[, a[ind]])
?})
p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1 <- do.call(rbind, p1)

My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar ? ? ? var1 ? var2 ?var3 ?var4 ? var5
1
2
3
4

Please help me !

Thanks

NIL

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You can do this using function lmList() from package nlme, without 
having to split the data frames, e.g.,

library(nlme)

mlis <- lmList(yvar ~ .  - clvar | clvar, data = df)
mlis
summary(mlis)

I hope it helps.

Best,
Dimitris
Dear All

Sorry for this simple question, I could not solve it by spending days.

My data looks like this:

# data
set.seed(1234)
clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar<-  rnorm(40, 10,6);
var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)

# manual splitting
df1<- subset(df, clvar == 1)
df2<- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)

# i tried to mechanize it
*

for(i in 1:5) {

           df[i]<- subset(df, clvar == i)

}

I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My limited
R knowledge, did not help at all !

*

# working on each of variable, just trying simple function
  a<- 3:8
out1<- lapply(1:5, function(ind){
                    lm(df1$yvar ~ df1[, a[ind]])
  })
p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1<- do.call(rbind, p1)

My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar       var1   var2  var3  var4   var5
1
2
3
4

Please help me !

Thanks

NIL

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
Hi:

Dimitris' solution is appropriate, but it needs to be mentioned that
the approach I offered earlier in this thread differs from the
lmList() approach. lmList() uses a pooled measure of error MSE (which
you can see at the bottom of the output from summary(mlis) ), whereas
the plyr approach subdivides the data into distinct sub-data frames
and analyzes them as separate entities. As a result, the residual MSEs
will differ between the two approaches, which in turn affects the
significance tests on the model coefficients. You need to decide which
approach is better for your purposes.

Cheers,
Dennis

On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos
You can do this using function lmList() from package nlme, without having to
split the data frames, e.g.,

library(nlme)

mlis <- lmList(yvar ~ . ?- clvar | clvar, data = df)
mlis
summary(mlis)

I hope it helps.

Best,
Dimitris

On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
Dear All

Sorry for this simple question, I could not solve it by spending days.

My data looks like this:

# data
set.seed(1234)
clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar<- ?rnorm(40, 10,6);
var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)

# manual splitting
df1<- subset(df, clvar == 1)
df2<- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)

# i tried to mechanize it
*

for(i in 1:5) {

? ? ? ? ? df[i]<- subset(df, clvar == i)

}

I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My
limited
R knowledge, did not help at all !

*

# working on each of variable, just trying simple function
?a<- 3:8
out1<- lapply(1:5, function(ind){
? ? ? ? ? ? ? ? ? ?lm(df1$yvar ~ df1[, a[ind]])
?})
p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1<- do.call(rbind, p1)

My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding
p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar ? ? ? var1 ? var2 ?var3 ?var4 ? var5
1
2
3
4

Please help me !

Thanks

NIL

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

well, if a pooled estimate of the residual standard error is not 
desirable, then you just need to set argument 'pool' of lmList() to 
FALSE, e.g.,

mlis <- lmList(yvar ~ .  - clvar | clvar, data = df, pool = FALSE)
summary(mlis)

Best,
Dimitris
Hi:

Dimitris' solution is appropriate, but it needs to be mentioned that
the approach I offered earlier in this thread differs from the
lmList() approach. lmList() uses a pooled measure of error MSE (which
you can see at the bottom of the output from summary(mlis) ), whereas
the plyr approach subdivides the data into distinct sub-data frames
and analyzes them as separate entities. As a result, the residual MSEs
will differ between the two approaches, which in turn affects the
significance tests on the model coefficients. You need to decide which
approach is better for your purposes.

Cheers,
Dennis

On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl>  wrote:
You can do this using function lmList() from package nlme, without having to
split the data frames, e.g.,

library(nlme)

mlis<- lmList(yvar ~ .  - clvar | clvar, data = df)
mlis
summary(mlis)

I hope it helps.

Best,
Dimitris

On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
Dear All

Sorry for this simple question, I could not solve it by spending days.

My data looks like this:

# data
set.seed(1234)
clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar<-  rnorm(40, 10,6);
var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)

# manual splitting
df1<- subset(df, clvar == 1)
df2<- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)

# i tried to mechanize it
*

for(i in 1:5) {

           df[i]<- subset(df, clvar == i)

}

I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My
limited
R knowledge, did not help at all !

*

# working on each of variable, just trying simple function
  a<- 3:8
out1<- lapply(1:5, function(ind){
                    lm(df1$yvar ~ df1[, a[ind]])
  })
p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1<- do.call(rbind, p1)

My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding
p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar       var1   var2  var3  var4   var5
1
2
3
4

Please help me !

Thanks

NIL

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110830/67b76b2e/attachment.pl>