Skip to content

Aggregrate function

9 messages · Jorge Ivan Velez, Monica Pisica, Christos Hatzis +2 more

#
Hi,
 
I have to recognize that i don't fully understand the aggregate function, but i think it should help me with what i want to do.
 
xveg is a data.frame with location, species, and total for the species. Each location is repeated, once for every species present at that location. For each location i want to find out which species has the maximum total ... so i've tried different ways to do it using aggregate.
 
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2))
sp <- c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d")
tot <- c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32)
xveg <- data.frame(loc, sp, tot)
 
result desired:
 
L1   b
L2   e
L3   b
 
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x) levels(x)[which.max(table(x))])
 
This is wrong because it gives the first species name in each level of location, so i get a, a, b, as species instead of b, e, b.
 
I've tried other few aggregate commands, all with wrong results.
 
I will appreciate any help,
 
Thanks,
 
Monica
 
_________________________________________________________________

 the go.
#
I don't have an easy solution with aggregate, because the function in
aggregate needs to return a scalar.
But the following should work:

do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))

   loc sp tot
L1  L1  b  60
L2  L2  e  30
L3  L3  b  68 

-Christos
#
Hi, 
 
Thanks for the solution. Mark Leeds sent me privately a very similar solution. My next question to him was:
 
Suppose that for a certain location 2 species have the same maximum total ... (there are ties in the data for a particular location). How do i get all species that have that max. total??
 
For this case i have changed the tot as follows:
 
tot <-  c(20, 60, 40, 15, 25, 15, 25, 20, 68, 32)
 
His sollution is (and does work):
 
temp <- lapply(split(xveg,loc), function(.df) {
  maxindices <- which(.df$tot == .df$tot[which.max(.df$tot)])
  data.frame(loc=.df$loc[1],sp=paste(.df$sp[maxindices],collapse=","),tot=max(.df$tot))
})

result <- do.call(rbind,temp)
print(result)
 
Thanks so much again,
 
Monica
_________________________________________________________________

 of your life.
#
This requires a small modification to use which instead of which.max that
returns only the first maximum:

do.call("rbind", lapply(split(xveg, xveg$loc), function(x) x[which(x$tot ==
max(x$tot)), ]))

     loc sp tot
L1    L1  b  60
L2.5  L2  d  25
L2.7  L2  e  25
L3    L3  b  68 

-Christos
#
Monica -
    Here's a more compact version of the  same idea:

   do.call(rbind,by(xveg,xveg['loc'],function(x)x[x$tot == max(x$tot),]))

                                        - Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Thu, 12 Feb 2009, Monica Pisica wrote:

            
#
aggregate and by are convenience functions of tapply.

Consider this alternate solution:

xveg[which(xveg$tot %in% with(xveg, tapply(tot, loc, max))),"sp"]

It uses tapply to find the maximums by loc(ations) and then to goes
back into xveg to find the corresponding sp(ecies).  You should do
testing to see whether the handling of ties agrees with your needs.

--
David Winsemius

On Feb 12, 2:56?pm, "Christos Hatzis" <christos.hat... at nuverabio.com>
wrote:
#
I realized later that the which might not be necessary (and in  
addition was reminded privately). The %in% function returns a logical  
vector which works just as well with matrix or dataframe indexing as  
the numeric vector returned by which.
#
Hi again,
 
Thanks a lot for all the suggestions. It will take me a little bit to wrap my head around to understand what is what, though! This will help me quite a bit.
 
One difference in the result output between you're solution and Mark's solution is this:
     loc sp tot
L1    L1  b  60
L2.5  L2  d  25
L2.7  L2  e  25
L3    L3  b  68

And Mark's solution:
   loc  sp tot
L1  L1   b  60
L2  L2 d,e  25
L3  L3   b  68

I will probably use both type of solutions depending what else i need to do with the data.
 
Thank you all for your help,
 
Monica


----------------------------------------
_________________________________________________________________

.

50F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008