An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110831/b91cfa6e/attachment.pl>
reshape/aggregate
6 messages · Ista Zahn, jim holtman, PIKAL Petr +1 more
Hi
Hi all, I apologize for this probably stupid question, but I really can't figure
it
out.
I have a dataframe like this:
group <- c(rep('A', 8), rep('B', 15), rep('C', 6))
time <- c(rep(seq(1:4), 2), rep(seq(1:5), 3), rep(seq(1:3), 2))
value <- runif (29, 1, 10)
dfx <- data.frame (group, time, value)
I want to calculate mean and standard deviation for all values that
belong
to the same group and the same time and end up with a dataframe with the columns time, group, mean and sd that contains the calculated values for every group at every time point only once (12). What is the most elegant way to do this? Oh, and I would like to avoid renaming columns (like the _X1/_X2 created by casting with multiple functions), if possible. I am sure that this is pretty basic, but I have already wasted a
ridiculous
amount of time on this.
see ?aggregate aggregate(dfx$value, list(group=dfx$group, time=dfx$time), function(x) c(mean(x), sd(x))) and maybe also plyr package could help you Regards Petr
Thanks, Kai [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The plyr solution is: library(plyr) ddply(dfx, .(group, time), summarize, mean = mean(value), sd = sd(value)) Best, Ista
On Wed, Aug 31, 2011 at 7:13 AM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
Hi
Hi all, I apologize for this probably stupid question, but I really can't figure
it
out.
I have a dataframe like this:
group <- c(rep('A', 8), rep('B', 15), rep('C', 6))
time <- c(rep(seq(1:4), 2), rep(seq(1:5), 3), rep(seq(1:3), 2))
value <- runif (29, 1, 10)
dfx <- data.frame (group, time, value)
I want to calculate mean and standard deviation for all values that
belong
to the same group and the same time and end up with a dataframe with the columns time, group, mean and sd that contains the calculated values for every group at every time point only once (12). What is the most elegant way to do this? Oh, and I would like to avoid renaming columns (like the _X1/_X2 created by casting with multiple functions), if possible. I am sure that this is pretty basic, but I have already wasted a
ridiculous
amount of time on this.
see ?aggregate aggregate(dfx$value, list(group=dfx$group, time=dfx$time), function(x) c(mean(x), sd(x))) and maybe also plyr package could help you Regards Petr
Thanks, Kai ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
You can use data.table:
group <- c(rep('A', 8), rep('B', 15), rep('C', 6))
time <- c(rep(seq(1:4), 2), rep(seq(1:5), 3), rep(seq(1:3), 2))
value <- runif (29, 1, 10)
dfx <- data.frame (group, time, value)
require(data.table)
dfx <- data.table(dfx)
dfx[,
+ list(mean = mean(value), sd = sd(value))
+ , by = list(group, time)
+ ]
group time mean sd
[1,] A 1 7.902432 0.8484807
[2,] A 2 5.583566 1.1996167
[3,] A 3 3.412691 1.1138794
[4,] A 4 7.786522 2.2367483
[5,] B 1 6.669257 2.1476769
[6,] B 2 2.902291 1.6630821
[7,] B 3 6.913593 0.9110182
[8,] B 4 4.713124 0.9521689
[9,] B 5 7.285824 1.5884689
[10,] C 1 3.799665 3.7728015
[11,] C 2 9.218785 0.9415034
[12,] C 3 5.098077 3.5256497
On Wed, Aug 31, 2011 at 4:19 AM, Kai Megerle <govokai at gmail.com> wrote:
Hi all,
I apologize for this probably stupid question, but I really can't figure it
out.
I have a dataframe like this:
group <- c(rep('A', 8), rep('B', 15), rep('C', 6))
time <- c(rep(seq(1:4), 2), rep(seq(1:5), 3), rep(seq(1:3), 2))
value <- runif (29, 1, 10)
dfx <- data.frame (group, time, value)
I want to calculate mean and standard deviation for all values that belong
to the same group and the same time and end up with a dataframe with the
columns time, group, mean and sd that contains the calculated values for
every group at every time point only once (12).
What is the most elegant way to do this? Oh, and I would like to avoid
renaming columns (like the _X1/_X2 created by casting with multiple
functions), if possible.
I am sure that this is pretty basic, but I have already wasted a ridiculous
amount of time on this.
Thanks,
Kai
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hi
The plyr solution is: library(plyr) ddply(dfx, .(group, time), summarize, mean = mean(value), sd =
sd(value)) I tried to do the task by ddply but I had difficulties to understand the correct syntax. Maybe in next issue of plyr summarise could be referenced in ddply help page. Or add something like: When performing summary values for a data frame according to levels of a factor you shall use syntax ddply(.data, .variables, summarise, .fun, ...) Regards Petr
Best, Ista On Wed, Aug 31, 2011 at 7:13 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
Hi
Hi all, I apologize for this probably stupid question, but I really can't
figure
it
out.
I have a dataframe like this:
group <- c(rep('A', 8), rep('B', 15), rep('C', 6))
time <- c(rep(seq(1:4), 2), rep(seq(1:5), 3), rep(seq(1:3), 2))
value <- runif (29, 1, 10)
dfx <- data.frame (group, time, value)
I want to calculate mean and standard deviation for all values that
belong
to the same group and the same time and end up with a dataframe with
the
columns time, group, mean and sd that contains the calculated values
for
every group at every time point only once (12). What is the most elegant way to do this? Oh, and I would like to
avoid
renaming columns (like the _X1/_X2 created by casting with multiple functions), if possible. I am sure that this is pretty basic, but I have already wasted a
ridiculous
amount of time on this.
see ?aggregate aggregate(dfx$value, list(group=dfx$group, time=dfx$time), function(x) c(mean(x), sd(x))) and maybe also plyr package could help you Regards Petr
Thanks, Kai [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110831/56064538/attachment.pl>