Skip to content

SUM,COUNT,AVG

14 messages · calpeda, Gabor Grothendieck, Stavros Macrakis +6 more

#
Hi,
I ve been searching a lot in internet..but I can t find a solution
Attached, you find a file.
I need for each (Materiale, tpdv, UM) to find sum,avg and count
My idea was to aggregate for the 3 parameters ..but I don t know how to get
the numeric value (SUM,COUNT,AVG) I need.
Can you help me?
thank you

http://www.nabble.com/file/p22905322/ordini2008_ex.txt ordini2008_ex.txt
#
I gather you have an SQL background since those are SQL functions.
Check out the sqldf R package and the many examples on the home
page:

http://sqldf.googlecode.com

and in ?sqldf

That may ease the transition from SQL to R.
On Mon, Apr 6, 2009 at 5:37 AM, calpeda <mauro.biasolo at calpeda.it> wrote:
#
On Mon, Apr 6, 2009 at 9:34 AM, Stavros Macrakis <macrakis at alum.mit.edu> wrote:
That's because ddply applies the function to the whole data frame, not
just the columns that aren't participating in the split.  One way
around it is:

ddply(dd, ~ b + c, function(df) each(length, sum, mean)(df$a))

I haven't figured out a more elegant way to specify this yet.

Hadley
#
calpeda:
If I have understood what you?re trying to accomplish, this should work:

$ library(Hmisc)
$ d=read.table("http://www.nabble.com/file/p22905322/ordini2008_ex.txt")
$ sumfun=function(x) c(sum=sum(x), count=length(x), avg=mean(x))
$ with(d, summarize(qta, Materiale, sumfun, stat.name=NULL))
      Materiale sum count  avg
1  14001850000   10     1 10,0
2  16006080000    2     1  2,0
3  30100300000    1     1  1,0
4  41SD0800000    3     3  1,0
5  44029740000    2     1  2,0
6  60000321000    3     3  1,0
7  60401721000    1     1  1,0
8  60900761000    2     1  2,0
9  70020030000    2     2  1,0
10 70310010000    2     2  1,0
11 70730040018    3     2  1,5
12 71710040014    1     1  1,0
#
A good package for this sort of questions is doBy, too.

library(doBy)
summaryBy( tpdv + UM + qta ~ Materiale ,data=data,FUN=c(sum,length,mean))

regards, Christian
#
Nice example. Does anyone know if it is possible to use multiple aggregating 
functions with the melt/cast functions?

Cheers,

Dylan
On Monday 06 April 2009, Christian wrote:

  
    
#
On Mon, Apr 6, 2009 at 5:31 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:
In theory, you should be able to combine colwise and each:
colwise(each(min, median, max)).   That should return the min, median
and max for each column, but currently it doesn't return the values in
quite right the form for recombination with ddply.

Hadley
#
On Apr 6, 2009, at 6:31 PM, Jun Shen wrote:

            
This looks reasonably straight-forward:


lapply(c(mean,sd, length), function(func)
                             {aggregate(state.x77, list(Region =  
state.region), func)} )
#
Jun Shen:
No, summarize() can apply multiple functions to *multiple* columns
(i.e., matrices). That?s one of the advantages of summarize() over 
aggregate().
#
Jun Shen:
summarize() *does* work with multiple columns from data frames, but the 
function you use must be able to do column-wise calculations. Example:

with(iris, summarize(iris[1:4], Species, colSums, stat.name=NULL))

If there weren?t a ?colSums? function, you could easily accomplish the same 
thing by using apply on the ?sum? function:

with(iris, summarize(iris[1:4], Species, function(x) apply(x,2,sum), 
stat.name=NULL))