Prev 206688 / 398503 Next

column selection for aggregate()

Gabor Grothendieck

Mon, Jan 18, 2010 10:30 AM

It looks ok except you have both specified the wanted factors and
removed the undesired factors from the data frame.  You only need to
do one of these as in the example I gave, not both, so the solution
could be simpler.

On Mon, Jan 18, 2010 at 11:19 AM, Ivan Calandra

<ivan.calandra at uni-hamburg.de> wrote:

Hi!

It looks like it works perfectly.
However, since I cannot check whether I get the good result or not, can you
please let me know if you see any mistakes?

Here is the code:
ssfamean <- summaryBy(.~SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA, data =
subset(ssfa, select = - c(MEASUREM, SEL_FACET, SEL_MEAS)), FUN=mean)

That should give me the mean for all numerical variables grouped by
SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA (i.e. the mean of the rows with
equal values for all these variables) on the data file ssfa without the
columns for MEASUREM, SEL_FACET, SEL_MEAS, right?

Sorry to ask such stupid question, but this line will give me the data I
have to analyze, I cannot afford to make any mistake here (nowhere of
course, but here I cannot really check).

Thanks in advance
Ivan


Gabor Grothendieck a ?crit?:

Try summaryBy in the doBy package. e.g. using the built-in CO2
summarize each numeric variable by each factor except for the factors
Plant and Type:

library(doBy)
summaryBy(. ~ ., data = subset(CO2, select = - c(Plant, Type)))


On Mon, Jan 18, 2010 at 9:53 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:


Hi everybody!

I'm working on R today so I have a lot of questions (you may have
noticed that it's the 3rd email today). I'm new on R, so please excuse
the "spam"!

I have a dataset "ssfa" with many rows and the column names are:
?> names(ssfa)
?[1] "SPECSHOR" ?"BONE" ? ? ?"TO_POS" ? ?"MEASUREM" ?"FACETTE" ? "SHEARFAC"
?[7] "ENA_BA" ? ?"SEL_FACET" "SEL_MEAS" ?"Asfc" ? ? ?"Smc" ? ? ? "epLsar"
[13] "HAsfc4" ? ?"HAsfc9" ? ?"HAsfc16" ? "HAsfc25" ? "HAsfc36" ? "HAsfc49"
[19] "HAsfc64" ? "HAsfc81" ? "HAsfc100" ?"HAsfc121" ?"Tfv" ? ? ? "Ftfv"

I want to aggregate that way:
ssfamean <- aggregate(ssfa[c("Asfc", "Smc", "epLsar", "HAsfc4",
"HAsfc9", "HAsfc16", "HAsfc25", "HAsfc36", "HAsfc49", "HAsfc64",
"HAsfc81", "HAsfc100", "HAsfc121", "Tfv", "Ftfv")], ssfa[c("SPECSHOR",
"BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")], mean).

As you can see, it is very long since I have many variables. Basically I
want to select all numerical variables (10 to 24), and all categorical
variables except MEASUREM, SEL_FACET and SEL_MEAS without having to
write each of them. I would also like to avoid writing the names, the
indexes would be nice.
I tried with:
?> ssfamean <- aggregate(ssfa[c(ssfa[[10]]:ssfa[[24]])],
ssfa[c("SPECSHOR", "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],
mean)
but it obviously doesn't work (well "obviously"...)

Could anyone help me on this?
Thanks in advance
Ivan

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thread (11 messages)

Ivan Calandra column selection for aggregate() Jan 18 b k column selection for aggregate() Jan 18 Ivan Calandra column selection for aggregate() Jan 18 b k column selection for aggregate() Jan 18 Ivan Calandra column selection for aggregate() Jan 18 Gabor Grothendieck column selection for aggregate() Jan 18 b k column selection for aggregate() Jan 18 Ivan Calandra column selection for aggregate() Jan 18 Gabor Grothendieck column selection for aggregate() Jan 18 PIKAL Petr column selection for aggregate() Jan 19 Ivan Calandra column selection for aggregate() Jan 19