An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/235eb785/attachment.pl>
column selection for aggregate()
11 messages · b k, Ivan Calandra, Gabor Grothendieck +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/516f8350/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/17f9bd85/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/b0f46d6c/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/9cdfff92/attachment.pl>
Try summaryBy in the doBy package. e.g. using the built-in CO2 summarize each numeric variable by each factor except for the factors Plant and Type: library(doBy) summaryBy(. ~ ., data = subset(CO2, select = - c(Plant, Type))) On Mon, Jan 18, 2010 at 9:53 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
Hi everybody!
I'm working on R today so I have a lot of questions (you may have
noticed that it's the 3rd email today). I'm new on R, so please excuse
the "spam"!
I have a dataset "ssfa" with many rows and the column names are:
?> names(ssfa)
?[1] "SPECSHOR" ?"BONE" ? ? ?"TO_POS" ? ?"MEASUREM" ?"FACETTE" ? "SHEARFAC"
?[7] "ENA_BA" ? ?"SEL_FACET" "SEL_MEAS" ?"Asfc" ? ? ?"Smc" ? ? ? "epLsar"
[13] "HAsfc4" ? ?"HAsfc9" ? ?"HAsfc16" ? "HAsfc25" ? "HAsfc36" ? "HAsfc49"
[19] "HAsfc64" ? "HAsfc81" ? "HAsfc100" ?"HAsfc121" ?"Tfv" ? ? ? "Ftfv"
I want to aggregate that way:
ssfamean <- aggregate(ssfa[c("Asfc", "Smc", "epLsar", "HAsfc4",
"HAsfc9", "HAsfc16", "HAsfc25", "HAsfc36", "HAsfc49", "HAsfc64",
"HAsfc81", "HAsfc100", "HAsfc121", "Tfv", "Ftfv")], ssfa[c("SPECSHOR",
"BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")], mean).
As you can see, it is very long since I have many variables. Basically I
want to select all numerical variables (10 to 24), and all categorical
variables except MEASUREM, SEL_FACET and SEL_MEAS without having to
write each of them. I would also like to avoid writing the names, the
indexes would be nice.
I tried with:
?> ssfamean <- aggregate(ssfa[c(ssfa[[10]]:ssfa[[24]])],
ssfa[c("SPECSHOR", "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],
mean)
but it obviously doesn't work (well "obviously"...)
Could anyone help me on this?
Thanks in advance
Ivan
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/0a971d88/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/b9798027/attachment.pl>
It looks ok except you have both specified the wanted factors and removed the undesired factors from the data frame. You only need to do one of these as in the example I gave, not both, so the solution could be simpler. On Mon, Jan 18, 2010 at 11:19 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
Hi!
It looks like it works perfectly.
However, since I cannot check whether I get the good result or not, can you
please let me know if you see any mistakes?
Here is the code:
ssfamean <- summaryBy(.~SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA, data =
subset(ssfa, select = - c(MEASUREM, SEL_FACET, SEL_MEAS)), FUN=mean)
That should give me the mean for all numerical variables grouped by
SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA (i.e. the mean of the rows with
equal values for all these variables) on the data file ssfa without the
columns for MEASUREM, SEL_FACET, SEL_MEAS, right?
Sorry to ask such stupid question, but this line will give me the data I
have to analyze, I cannot afford to make any mistake here (nowhere of
course, but here I cannot really check).
Thanks in advance
Ivan
Gabor Grothendieck a ?crit?:
Try summaryBy in the doBy package. e.g. using the built-in CO2
summarize each numeric variable by each factor except for the factors
Plant and Type:
library(doBy)
summaryBy(. ~ ., data = subset(CO2, select = - c(Plant, Type)))
On Mon, Jan 18, 2010 at 9:53 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
Hi everybody!
I'm working on R today so I have a lot of questions (you may have
noticed that it's the 3rd email today). I'm new on R, so please excuse
the "spam"!
I have a dataset "ssfa" with many rows and the column names are:
?> names(ssfa)
?[1] "SPECSHOR" ?"BONE" ? ? ?"TO_POS" ? ?"MEASUREM" ?"FACETTE" ? "SHEARFAC"
?[7] "ENA_BA" ? ?"SEL_FACET" "SEL_MEAS" ?"Asfc" ? ? ?"Smc" ? ? ? "epLsar"
[13] "HAsfc4" ? ?"HAsfc9" ? ?"HAsfc16" ? "HAsfc25" ? "HAsfc36" ? "HAsfc49"
[19] "HAsfc64" ? "HAsfc81" ? "HAsfc100" ?"HAsfc121" ?"Tfv" ? ? ? "Ftfv"
I want to aggregate that way:
ssfamean <- aggregate(ssfa[c("Asfc", "Smc", "epLsar", "HAsfc4",
"HAsfc9", "HAsfc16", "HAsfc25", "HAsfc36", "HAsfc49", "HAsfc64",
"HAsfc81", "HAsfc100", "HAsfc121", "Tfv", "Ftfv")], ssfa[c("SPECSHOR",
"BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")], mean).
As you can see, it is very long since I have many variables. Basically I
want to select all numerical variables (10 to 24), and all categorical
variables except MEASUREM, SEL_FACET and SEL_MEAS without having to
write each of them. I would also like to avoid writing the names, the
indexes would be nice.
I tried with:
?> ssfamean <- aggregate(ssfa[c(ssfa[[10]]:ssfa[[24]])],
ssfa[c("SPECSHOR", "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],
mean)
but it obviously doesn't work (well "obviously"...)
Could anyone help me on this?
Thanks in advance
Ivan
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi If I really wanted aggregate all numerics by all non numerics this is how I would do it my.numerics <- which(sapply(zeta, is.numeric)) my.factor <- which(sapply(zeta, is.factor)) aggregate(zeta[, my.numerics], zeta[, my.factor], mean) Regards Petr r-help-bounces at r-project.org napsal dne 18.01.2010 16:33:17:
I didn't understand from the help what really does the function rowMeans
but it looks like it doesn't take into account the categorical variables
(I want to calculate the means when the values of all categorical variables are the same, second part of aggregate). Moreover, ssfa_num contains only numeric variables, meaning that the categories will not be
associated with it. I'm kind of confused with this approach. You think it would work for me? Thanks Ivan b k a ?crit :
On Mon, Jan 18, 2010 at 10:17 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de <mailto:ivan.calandra at uni-hamburg.de>>
wrote:
Thanks for your answer, but it doesn't work...
Here is what I get:
> ssfamean <- aggregate(ssfa[[10:24]],ssfa[c("SPECSHOR", "BONE",
"TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],mean)
Error in .subset2(x, i, exact = exact) :
recursive indexing failed at level 2
Wouldn't you be better off with rowMeans() ? Split your dataframe into
numeric matrix:
ssfa_num <- ssfa[10:24]
ssfameans <- rowMeans(ssfa_num)
Also col_index <- match("Asfc", ssfa) doesn't really work since
col_index is composed of 1227 NAs...
Yes, it should be:
col_index <- match("Asfc", names(ssfa))
Ben
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100119/0025ce5f/attachment.pl>