An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110829/c459d285/attachment.pl>
splitting into multiple dataframes and then create a loop to work
6 messages · Dimitris Rizopoulos, Dennis Murphy, Nilaya Sharma
Hi:
This is straightforward to do with the plyr package:
# install.packages('plyr')
library('plyr')
set.seed(1234)
df <- data.frame(clvar = rep(1:4, each = 10), yvar = rnorm(40, 10, 6),
var1 = rnorm(40, 10, 4), var2 = rnorm(40, 10, 4),
var3 = rnorm(40, 5, 2), var4 = rnorm(40, 10, 3),
var5 = rnorm(40, 15, 8))
mods <- dlply(df, .(clvar), function(d) lm(yvar ~ . - clvar, data = d))
summary(mods[[1]])
mods is a list of model objects, one per subgroup defined by clvar.
You can use extraction functions to pull out pieces from each model,
e.g.,
ldply(mods, function(m) summary(m)[['r.squared']])
ldply(mods, function(m) coef(m))
ldply(mods, function(m) resid(m))
The dlply() function reads a data frame as input and outputs to a
list; conversely, the ldply() function reads from a list and outputs
to a data frame. The functions you call inside have to be compatible
with the input and output data types.
HTH,
Dennis
On Mon, Aug 29, 2011 at 8:37 AM, Nilaya Sharma <nilaya.sharma at gmail.com> wrote:
Dear All
Sorry for this simple question, I could not solve it by spending days.
My data looks like this:
# data
set.seed(1234)
clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar <- ?rnorm(40, 10,6);
var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <- rnorm(40, 5, 2);
var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example
df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
# manual splitting
df1 <- subset(df, clvar == 1)
df2 <- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)
# i tried to mechanize it
*
for(i in 1:5) {
? ? ? ? ?df[i] <- subset(df, clvar == i)
}
I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My limited
R knowledge, did not help at all !
*
# working on each of variable, just trying simple function
?a <- 3:8
out1 <- lapply(1:5, function(ind){
? ? ? ? ? ? ? ? ? lm(df1$yvar ~ df1[, a[ind]])
?})
p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1 <- do.call(rbind, p1)
My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar ? ? ? var1 ? var2 ?var3 ?var4 ? var5
1
2
3
4
Please help me !
Thanks
NIL
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
You can do this using function lmList() from package nlme, without having to split the data frames, e.g., library(nlme) mlis <- lmList(yvar ~ . - clvar | clvar, data = df) mlis summary(mlis) I hope it helps. Best, Dimitris
On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
Dear All
Sorry for this simple question, I could not solve it by spending days.
My data looks like this:
# data
set.seed(1234)
clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar<- rnorm(40, 10,6);
var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
# manual splitting
df1<- subset(df, clvar == 1)
df2<- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)
# i tried to mechanize it
*
for(i in 1:5) {
df[i]<- subset(df, clvar == i)
}
I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My limited
R knowledge, did not help at all !
*
# working on each of variable, just trying simple function
a<- 3:8
out1<- lapply(1:5, function(ind){
lm(df1$yvar ~ df1[, a[ind]])
})
p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1<- do.call(rbind, p1)
My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar var1 var2 var3 var4 var5
1
2
3
4
Please help me !
Thanks
NIL
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
Hi: Dimitris' solution is appropriate, but it needs to be mentioned that the approach I offered earlier in this thread differs from the lmList() approach. lmList() uses a pooled measure of error MSE (which you can see at the bottom of the output from summary(mlis) ), whereas the plyr approach subdivides the data into distinct sub-data frames and analyzes them as separate entities. As a result, the residual MSEs will differ between the two approaches, which in turn affects the significance tests on the model coefficients. You need to decide which approach is better for your purposes. Cheers, Dennis On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
You can do this using function lmList() from package nlme, without having to split the data frames, e.g., library(nlme) mlis <- lmList(yvar ~ . ?- clvar | clvar, data = df) mlis summary(mlis) I hope it helps. Best, Dimitris On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
Dear All
Sorry for this simple question, I could not solve it by spending days.
My data looks like this:
# data
set.seed(1234)
clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar<- ?rnorm(40, 10,6);
var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
# manual splitting
df1<- subset(df, clvar == 1)
df2<- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)
# i tried to mechanize it
*
for(i in 1:5) {
? ? ? ? ? df[i]<- subset(df, clvar == i)
}
I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My
limited
R knowledge, did not help at all !
*
# working on each of variable, just trying simple function
?a<- 3:8
out1<- lapply(1:5, function(ind){
? ? ? ? ? ? ? ? ? ?lm(df1$yvar ~ df1[, a[ind]])
?})
p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1<- do.call(rbind, p1)
My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding
p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar ? ? ? var1 ? var2 ?var3 ?var4 ? var5
1
2
3
4
Please help me !
Thanks
NIL
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
well, if a pooled estimate of the residual standard error is not desirable, then you just need to set argument 'pool' of lmList() to FALSE, e.g., mlis <- lmList(yvar ~ . - clvar | clvar, data = df, pool = FALSE) summary(mlis) Best, Dimitris
On 8/29/2011 9:20 PM, Dennis Murphy wrote:
Hi: Dimitris' solution is appropriate, but it needs to be mentioned that the approach I offered earlier in this thread differs from the lmList() approach. lmList() uses a pooled measure of error MSE (which you can see at the bottom of the output from summary(mlis) ), whereas the plyr approach subdivides the data into distinct sub-data frames and analyzes them as separate entities. As a result, the residual MSEs will differ between the two approaches, which in turn affects the significance tests on the model coefficients. You need to decide which approach is better for your purposes. Cheers, Dennis On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos <d.rizopoulos at erasmusmc.nl> wrote:
You can do this using function lmList() from package nlme, without having to split the data frames, e.g., library(nlme) mlis<- lmList(yvar ~ . - clvar | clvar, data = df) mlis summary(mlis) I hope it helps. Best, Dimitris On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
Dear All
Sorry for this simple question, I could not solve it by spending days.
My data looks like this:
# data
set.seed(1234)
clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar<- rnorm(40, 10,6);
var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
# manual splitting
df1<- subset(df, clvar == 1)
df2<- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)
# i tried to mechanize it
*
for(i in 1:5) {
df[i]<- subset(df, clvar == i)
}
I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My
limited
R knowledge, did not help at all !
*
# working on each of variable, just trying simple function
a<- 3:8
out1<- lapply(1:5, function(ind){
lm(df1$yvar ~ df1[, a[ind]])
})
p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1<- do.call(rbind, p1)
My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding
p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar var1 var2 var3 var4 var5
1
2
3
4
Please help me !
Thanks
NIL
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110830/67b76b2e/attachment.pl>