Skip to content

getting data into correct format for summarizing ... reshape, aggregate, or...

6 messages · stephen sefick, Sébastien Bihorel, Gabor Grothendieck +2 more

#
I would like to reformat this data frame into something that I can
produce some descriptive statistics.  I have been playing around with
the reshape package and maybe this is not the best way to proceed.  I
would like to use RiverMile and constituent as the grouping variables
to get the summary statistics:

198a    198b
mean   mean
sd       sd
...        ...

etc. for all of these.
I have tried reshape and aggregate and I am sure that I am missing something...

below is a naive attempt at making a data frame with the columns in
the correct class-  This can be improved also.  There are NA in the
real data set, but I didn't know how to randomly intersperse NA in a
created matrix.  I hope this makes sense.  If it doesn't I will go
back to the drawing board and try and clarify this.

value <- rnorm(30)
RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10),
rep(198, length.out=10))
constituent <- c (rep("a", length.out=5), rep("b", length.out=5),
rep("a", length.out=5), rep("b", length.out=5), rep("a",
length.out=5), rep("b", length.out=5))
df <- cbind(as.integer(RiverMile), as.factor(constituent), as.numeric(value))
df.1 <- as.data.frame(df)
df.1[,"V1"] <- as.integer(df.1[,"V1"])
df.1[,"V2"] <- as.factor(df.1[,"V2"])
df.1[,"V3"] <- as.numeric(df.1[,"V3"])
colnames(df.1) <- c("RiverMile", "constituent", "value")
#
On Mon, 15 Sep 2008 12:14:40 -0400,
"stephen sefick" <ssefick at gmail.com> wrote:

            
df <- data.frame(RiverMile=c(rep(215, 10), rep(202, 10), rep(198, 10)),
                 constituent=gl(2, 5, 30, labels=letters[1:2]),
                 value=rnorm(30))

by(df, list(df[[1]], df[[2]]), summary) # or build your summary function
---<---------------cut here---------------end---------------->---

?
#
Try this:
RiverMile constituent  value.mean  value.sd
1       198           1 -0.06015032 0.8690358
2       198           2 -0.38923255 0.5147604
3       202           1  0.35731576 0.8280943
4       202           2  1.00463813 0.9272342
5       215           1  0.18249485 1.1861883
6       215           2 -0.10863353 0.7564736
On Mon, Sep 15, 2008 at 12:14 PM, stephen sefick <ssefick at gmail.com> wrote:
#
I think your problem is coming from the cbind.  You are forcing the data into a matrix not a data.frame. Neither aggregate or cast will work on that matrix.

Do a str(df1) or class(df1) and you will see what is happening

Try this using the reshape package.  Note the code runs but I have not verified the results. The function approach comes from Hadley's vignette at had.co.nz/reshape/introduction.pdf .
===================================================================== 

df1 <- data.frame(RiverMile, constituent, value)
cast(df1, RiverMile + constituent ~ ., function(x) c(means= mean(x),SD=sd(x)))
=====================================================================
--- On Mon, 9/15/08, stephen sefick <ssefick at gmail.com> wrote:

            
__________________________________________________________________
[[elided Yahoo spam]]
#
thanks all I ended up using:
cast(melt(df1), RiverMile + constituent ~ ., function(x) c(means=
mean(x, na.rm=TRUE),SD=sd1(x, na.rm=TRUE), CV=CV(x, na.rm=TRUE),
MIN=min(x), MAX=max(x), twentyfive=Q25(x), seventyfive=Q75(x)
,n=valid.n(x)))

and this worked quite well for my needs
On Mon, Sep 15, 2008 at 12:48 PM, John Kane <jrkrideau at yahoo.ca> wrote:

  
    
#
Hi

Another possibility is to use split - sapply construction

sapply(split(df.1[,3],  list(df.1$RiverMile, df.1$constituent)), summary)

Regards

Petr Pikal
petr.pikal at precheza.cz
724008364, 581252140, 581252257


r-help-bounces at r-project.org napsal dne 15.09.2008 18:14:40:
something...
as.numeric(value))
http://www.R-project.org/posting-guide.html