Skip to content

subtotal, submean, aggregate

8 messages · Gabor Grothendieck, Roger Bivand, Patrick Giraudoux

#
Dear All,

I would like to make partial sums (or means or any other function) of 
the values in intervals along a sequence (spatial transect) where groups 
are defined.

For instance:

habitats<-rep(c("meadow","forest","meadow","pasture"),c(10,5,12,6))
observations<-rpois(length(habitats),2)
transect<-data.frame(observations=observations,habitats=habitats)

aggregate() is not suitable for my purpose because I want a result 
respecting the order of the habitats encountered although they may have 
the same name (and not pooling each group on each level of the factor 
created). For instance, the output of the ideal function 
mynicefunction() would be something as:

mynicefunction(transect$observations, by=list(transect$habitats),sum)
meadow     16
forest      9
meadow     21
pasture    17

and not

aggregate(transect$observations,by=list(transect$habitats),sum)
  Group.1  x
1  forest  9
2  meadow 37
3 pasture 17

Did anybody hear about such a function already written in R? If no, any 
idea to make it simple and elegant to write?

Cheers,

Patrick Giraudoux
#
Create another variable that gives the run number and aggregate on
both the habitat and run number removing the run number after
aggregating:

runno <- cumsum(c(TRUE, diff(as.numeric(transect[,2])) !=0))
aggregate(transect[,1], list(obs = transect[,2], runno = runno), sum)[,-2]

This does not give the same as your example but I think there are some
errors in your example output.
On 2/26/06, Patrick Giraudoux <patrick.giraudoux at univ-fcomte.fr> wrote:
#
On Sun, 26 Feb 2006, Patrick Giraudoux wrote:

            
I got as far as:

rle.habs <- rle(habitats)
habitats1 <- rep(make.names(rle.habs$values, unique=TRUE), rle.habs$lengths)
aggregate(observations,by=list(habitats1),sum)

making an extra habitats vector with a unique label for each run. 

Since I don't know your seed, the results are not the same, but rle() is 
quite good for runs.

Roger

  
    
#
We are just comparing the difference to 0 so it does not matter if its positive
or negative.  All that matters is whether its 0 or not.

In fact, the runno you calculate with the abs is identical to the one
I posted without the abs:

runno <- cumsum(c(TRUE, abs(diff(as.numeric(transect[,2])))!=0))
runno2 <- cumsum(c(TRUE, diff(as.numeric(transect[,2])))!=0)
identical(runno, runno2)  # TRUE
On 2/26/06, Patrick Giraudoux <patrick.giraudoux at univ-fcomte.fr> wrote:
both the
aggregating:

runno <-
aggregate(transect[,1],
This does not give the
errors in your example
On 2/26/06, Patrick Giraudoux
I would like to make partial sums (or means or any other
the values in intervals along a sequence (spatial transect)
are defined.

For
habitats<-rep(c("meadow","forest","meadow","pasture"),c(10,5,12,6))
observations<-rpois(length(habitats),2)
transect<-data.frame(observations=observations,habitats=habitats)

aggregate()
respecting the order
the same name (and not
created). For instance, the
mynicefunction() would be something
mynicefunction(transect$observations,
meadow 16
forest 9
meadow 21
pasture 17

and
aggregate(transect$observations,by=list(transect$habitats),sum)
1 forest 9
2 meadow 37
3 pasture 17

Did anybody hear about such a
idea to make it simple and elegant
Cheers,

Patrick
______________________________________________
R-help at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do

            
#
Yes, that must be it.  Probably best to issue a:

set.seed(1)

as part of the code when posting examples with random numbers.

Also here is a variation that uses rle that Roger used together with
some elements of the solution I posted:

runno <- with(rle(as.numeric(transect[,2])), rep(seq(along = lengths), lengths))
aggregate(transect[,1], list(obs = transect[,2], runno), sum)[,-2]
On 2/26/06, Patrick Giraudoux <patrick.giraudoux at univ-fcomte.fr> wrote:
or negative. All that matters is whether its 0 or not.

In fact,
I posted
runno <- cumsum(c(TRUE,
runno2 <- cumsum(c(TRUE,
identical(runno, runno2) # TRUE


On
Actually the discrepancy you noticed remaining comes from
difference in
diff(as.numeric(transect[,2]))
One can work it around
makes:

runno <-
aggregate(transect[,1], list(obs =
I did not know about this use of diff,
cumsum for polishing. Really great and
Thanks a
Cheers,

Patrick


Gabor Grothendieck a ??crit :
Create another
runno <-
For
observations<-rpois(length(habitats),2)
transect<-data.frame(observations=observations,habitats=habitats)

aggregate()
forest 9
meadow 21
pasture 17

and
2 meadow 37
3 pasture 17

Did anybody hear about such a
Patrick

        
R-help at stat.math.ethz.ch
list
PLEASE do
http://www.R-project.org/posting-guide.html