column selection for aggregate()

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/235eb785/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/516f8350/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/17f9bd85/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/b0f46d6c/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/9cdfff92/attachment.pl>
Try summaryBy in the doBy package. e.g. using the built-in CO2
summarize each numeric variable by each factor except for the factors
Plant and Type:

library(doBy)
summaryBy(. ~ ., data = subset(CO2, select = - c(Plant, Type)))

On Mon, Jan 18, 2010 at 9:53 AM, Ivan Calandra
Hi everybody!

I'm working on R today so I have a lot of questions (you may have
noticed that it's the 3rd email today). I'm new on R, so please excuse
the "spam"!

I have a dataset "ssfa" with many rows and the column names are:
?> names(ssfa)
?[1] "SPECSHOR" ?"BONE" ? ? ?"TO_POS" ? ?"MEASUREM" ?"FACETTE" ? "SHEARFAC"
?[7] "ENA_BA" ? ?"SEL_FACET" "SEL_MEAS" ?"Asfc" ? ? ?"Smc" ? ? ? "epLsar"
[13] "HAsfc4" ? ?"HAsfc9" ? ?"HAsfc16" ? "HAsfc25" ? "HAsfc36" ? "HAsfc49"
[19] "HAsfc64" ? "HAsfc81" ? "HAsfc100" ?"HAsfc121" ?"Tfv" ? ? ? "Ftfv"

I want to aggregate that way:
ssfamean <- aggregate(ssfa[c("Asfc", "Smc", "epLsar", "HAsfc4",
"HAsfc9", "HAsfc16", "HAsfc25", "HAsfc36", "HAsfc49", "HAsfc64",
"HAsfc81", "HAsfc100", "HAsfc121", "Tfv", "Ftfv")], ssfa[c("SPECSHOR",
"BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")], mean).

As you can see, it is very long since I have many variables. Basically I
want to select all numerical variables (10 to 24), and all categorical
variables except MEASUREM, SEL_FACET and SEL_MEAS without having to
write each of them. I would also like to avoid writing the names, the
indexes would be nice.
I tried with:
?> ssfamean <- aggregate(ssfa[c(ssfa[[10]]:ssfa[[24]])],
ssfa[c("SPECSHOR", "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],
mean)
but it obviously doesn't work (well "obviously"...)

Could anyone help me on this?
Thanks in advance
Ivan

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/0a971d88/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100118/b9798027/attachment.pl>
It looks ok except you have both specified the wanted factors and
removed the undesired factors from the data frame.  You only need to
do one of these as in the example I gave, not both, so the solution
could be simpler.

On Mon, Jan 18, 2010 at 11:19 AM, Ivan Calandra
Hi!

It looks like it works perfectly.
However, since I cannot check whether I get the good result or not, can you
please let me know if you see any mistakes?

Here is the code:
ssfamean <- summaryBy(.~SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA, data =
subset(ssfa, select = - c(MEASUREM, SEL_FACET, SEL_MEAS)), FUN=mean)

That should give me the mean for all numerical variables grouped by
SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA (i.e. the mean of the rows with
equal values for all these variables) on the data file ssfa without the
columns for MEASUREM, SEL_FACET, SEL_MEAS, right?

Sorry to ask such stupid question, but this line will give me the data I
have to analyze, I cannot afford to make any mistake here (nowhere of
course, but here I cannot really check).

Thanks in advance
Ivan

Gabor Grothendieck a ?crit?:

Try summaryBy in the doBy package. e.g. using the built-in CO2
summarize each numeric variable by each factor except for the factors
Plant and Type:

library(doBy)
summaryBy(. ~ ., data = subset(CO2, select = - c(Plant, Type)))

On Mon, Jan 18, 2010 at 9:53 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:

Hi everybody!

I'm working on R today so I have a lot of questions (you may have
noticed that it's the 3rd email today). I'm new on R, so please excuse
the "spam"!

I have a dataset "ssfa" with many rows and the column names are:
?> names(ssfa)
?[1] "SPECSHOR" ?"BONE" ? ? ?"TO_POS" ? ?"MEASUREM" ?"FACETTE" ? "SHEARFAC"
?[7] "ENA_BA" ? ?"SEL_FACET" "SEL_MEAS" ?"Asfc" ? ? ?"Smc" ? ? ? "epLsar"
[13] "HAsfc4" ? ?"HAsfc9" ? ?"HAsfc16" ? "HAsfc25" ? "HAsfc36" ? "HAsfc49"
[19] "HAsfc64" ? "HAsfc81" ? "HAsfc100" ?"HAsfc121" ?"Tfv" ? ? ? "Ftfv"

I want to aggregate that way:
ssfamean <- aggregate(ssfa[c("Asfc", "Smc", "epLsar", "HAsfc4",
"HAsfc9", "HAsfc16", "HAsfc25", "HAsfc36", "HAsfc49", "HAsfc64",
"HAsfc81", "HAsfc100", "HAsfc121", "Tfv", "Ftfv")], ssfa[c("SPECSHOR",
"BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")], mean).

As you can see, it is very long since I have many variables. Basically I
want to select all numerical variables (10 to 24), and all categorical
variables except MEASUREM, SEL_FACET and SEL_MEAS without having to
write each of them. I would also like to avoid writing the names, the
indexes would be nice.
I tried with:
?> ssfamean <- aggregate(ssfa[c(ssfa[[10]]:ssfa[[24]])],
ssfa[c("SPECSHOR", "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],
mean)
but it obviously doesn't work (well "obviously"...)

Could anyone help me on this?
Thanks in advance
Ivan

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hi

If I really wanted aggregate all numerics by all non numerics this is how 
I would do it

my.numerics <- which(sapply(zeta, is.numeric))
my.factor <- which(sapply(zeta, is.factor))
aggregate(zeta[, my.numerics], zeta[, my.factor], mean)

Regards
Petr

r-help-bounces at r-project.org napsal dne 18.01.2010 16:33:17:
I didn't understand from the help what really does the function rowMeans 
but it looks like it doesn't take into account the categorical variables 
(I want to calculate the means when the values of all categorical 
variables are the same, second part of aggregate). Moreover, ssfa_num 
contains only numeric variables, meaning that the categories will not be 
associated with it.
I'm kind of confused with this approach.
You think it would work for me?

Thanks
Ivan

b k a ?crit :

On Mon, Jan 18, 2010 at 10:17 AM, Ivan Calandra 
<ivan.calandra at uni-hamburg.de <mailto:ivan.calandra at uni-hamburg.de>> 
wrote:

    Thanks for your answer, but it doesn't work...

    Here is what I get:
    > ssfamean <- aggregate(ssfa[[10:24]],ssfa[c("SPECSHOR", "BONE",
    "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],mean)
    Error in .subset2(x, i, exact = exact) :
      recursive indexing failed at level 2

Wouldn't you be better off with rowMeans() ? Split your dataframe into 
numeric matrix:

ssfa_num  <- ssfa[10:24]

ssfameans <- rowMeans(ssfa_num)

    Also col_index <- match("Asfc", ssfa) doesn't really work since
    col_index is composed of 1227 NAs...

Yes, it should be:

col_index <- match("Asfc", names(ssfa))

Ben
   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100119/0025ce5f/attachment.pl>