Hi,
I have to recognize that i don't fully understand the aggregate function, but i think it should help me with what i want to do.
xveg is a data.frame with location, species, and total for the species. Each location is repeated, once for every species present at that location. For each location i want to find out which species has the maximum total ... so i've tried different ways to do it using aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2))
sp <- c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d")
tot <- c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32)
xveg <- data.frame(loc, sp, tot)
result desired:
L1 b
L2 e
L3 b
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x) levels(x)[which.max(table(x))])
This is wrong because it gives the first species name in each level of location, so i get a, a, b, as species instead of b, e, b.
I've tried other few aggregate commands, all with wrong results.
I will appreciate any help,
Thanks,
Monica
_________________________________________________________________
the go.
Aggregrate function
9 messages · Jorge Ivan Velez, Monica Pisica, Christos Hatzis +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090212/4a45674a/attachment-0001.pl>
I don't have an easy solution with aggregate, because the function in
aggregate needs to return a scalar.
But the following should work:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))
loc sp tot
L1 L1 b 60
L2 L2 e 30
L3 L3 b 68
-Christos
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Monica Pisica
Sent: Thursday, February 12, 2009 1:58 PM
To: R help project
Subject: [R] Aggregrate function
Hi,
I have to recognize that i don't fully understand the
aggregate function, but i think it should help me with what i
want to do.
xveg is a data.frame with location, species, and total for
the species. Each location is repeated, once for every
species present at that location. For each location i want to
find out which species has the maximum total ... so i've
tried different ways to do it using aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d") tot <-
c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32) xveg <-
data.frame(loc, sp, tot)
result desired:
L1 b
L2 e
L3 b
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x)
levels(x)[which.max(table(x))])
This is wrong because it gives the first species name in each
level of location, so i get a, a, b, as species instead of b, e, b.
I've tried other few aggregate commands, all with wrong results.
I will appreciate any help,
Thanks,
Monica
_________________________________________________________________ the go. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi,
Thanks for the solution. Mark Leeds sent me privately a very similar solution. My next question to him was:
Suppose that for a certain location 2 species have the same maximum total ... (there are ties in the data for a particular location). How do i get all species that have that max. total??
For this case i have changed the tot as follows:
tot <- c(20, 60, 40, 15, 25, 15, 25, 20, 68, 32)
His sollution is (and does work):
temp <- lapply(split(xveg,loc), function(.df) {
maxindices <- which(.df$tot == .df$tot[which.max(.df$tot)])
data.frame(loc=.df$loc[1],sp=paste(.df$sp[maxindices],collapse=","),tot=max(.df$tot))
})
result <- do.call(rbind,temp)
print(result)
Thanks so much again,
Monica
From: christos.hatzis at nuverabio.com
To: pisicandru at hotmail.com; r-help at r-project.org
Subject: RE: [R] Aggregrate function
Date: Thu, 12 Feb 2009 15:56:38 -0500
I don't have an easy solution with aggregate, because the function in
aggregate needs to return a scalar.
But the following should work:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))
loc sp tot
L1 L1 b 60
L2 L2 e 30
L3 L3 b 68
-Christos
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Monica Pisica
Sent: Thursday, February 12, 2009 1:58 PM
To: R help project
Subject: [R] Aggregrate function
Hi,
I have to recognize that i don't fully understand the
aggregate function, but i think it should help me with what i
want to do.
xveg is a data.frame with location, species, and total for
the species. Each location is repeated, once for every
species present at that location. For each location i want to
find out which species has the maximum total ... so i've
tried different ways to do it using aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d") tot <-
c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32) xveg <-
data.frame(loc, sp, tot)
result desired:
L1 b
L2 e
L3 b
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x)
levels(x)[which.max(table(x))])
This is wrong because it gives the first species name in each
level of location, so i get a, a, b, as species instead of b, e, b.
I've tried other few aggregate commands, all with wrong results.
I will appreciate any help,
Thanks,
Monica
_________________________________________________________________ the go. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________ of your life.
This requires a small modification to use which instead of which.max that
returns only the first maximum:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x) x[which(x$tot ==
max(x$tot)), ]))
loc sp tot
L1 L1 b 60
L2.5 L2 d 25
L2.7 L2 e 25
L3 L3 b 68
-Christos
-----Original Message-----
From: Monica Pisica [mailto:pisicandru at hotmail.com]
Sent: Thursday, February 12, 2009 4:35 PM
To: christos.hatzis at nuverabio.com; R help project;
markleeds at verizon.net
Subject: RE: [R] Aggregrate function
Hi,
Thanks for the solution. Mark Leeds sent me privately a very
similar solution. My next question to him was:
Suppose that for a certain location 2 species have the same
maximum total ... (there are ties in the data for a
particular location). How do i get all species that have that
max. total??
For this case i have changed the tot as follows:
tot <- c(20, 60, 40, 15, 25, 15, 25, 20, 68, 32)
His sollution is (and does work):
temp <- lapply(split(xveg,loc), function(.df) {
maxindices <- which(.df$tot == .df$tot[which.max(.df$tot)])
data.frame(loc=.df$loc[1],sp=paste(.df$sp[maxindices],collapse
=","),tot=max(.df$tot))
})
result <- do.call(rbind,temp)
print(result)
Thanks so much again,
Monica
From: christos.hatzis at nuverabio.com To: pisicandru at hotmail.com; r-help at r-project.org Subject: RE: [R] Aggregrate function Date: Thu, 12 Feb 2009 15:56:38 -0500 I don't have an easy solution with aggregate, because the
function in
aggregate needs to return a scalar.
But the following should work:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))
loc sp tot
L1 L1 b 60
L2 L2 e 30
L3 L3 b 68
-Christos
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Monica Pisica Sent: Thursday, February 12, 2009 1:58 PM To: R help project Subject: [R] Aggregrate function Hi, I have to recognize that i don't fully understand the aggregate function, but i think it should help me with what i want to do. xveg is a data.frame with location, species, and total for the species. Each location is repeated, once for every species
present at
that location. For each location i want to find out which
species has
the maximum total ... so i've tried different ways to do it using
aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
c("a", "b",
"c", "a", "d", "b", "e", "c", "b", "d") tot <- c(20, 60,
40, 15, 25,
10, 30, 20, 68, 32) xveg <- data.frame(loc, sp, tot) result desired: L1 b L2 e L3 b sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x) levels(x)[which.max(table(x))]) This is wrong because it gives the first species name in
each level
of location, so i get a, a, b, as species instead of b, e, b. I've tried other few aggregate commands, all with wrong results. I will appreciate any help, Thanks, Monica
_________________________________________________________________ the go. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________ See how Windows connects the people, information, and fun that are part of your life. http://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/
Monica -
Here's a more compact version of the same idea:
do.call(rbind,by(xveg,xveg['loc'],function(x)x[x$tot == max(x$tot),]))
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Thu, 12 Feb 2009, Monica Pisica wrote:
Hi,
Thanks for the solution. Mark Leeds sent me privately a very similar solution. My next question to him was:
Suppose that for a certain location 2 species have the same maximum total ... (there are ties in the data for a particular location). How do i get all species that have that max. total??
For this case i have changed the tot as follows:
tot <- c(20, 60, 40, 15, 25, 15, 25, 20, 68, 32)
His sollution is (and does work):
temp <- lapply(split(xveg,loc), function(.df) {
maxindices <- which(.df$tot == .df$tot[which.max(.df$tot)])
data.frame(loc=.df$loc[1],sp=paste(.df$sp[maxindices],collapse=","),tot=max(.df$tot))
})
result <- do.call(rbind,temp)
print(result)
Thanks so much again,
Monica
From: christos.hatzis at nuverabio.com
To: pisicandru at hotmail.com; r-help at r-project.org
Subject: RE: [R] Aggregrate function
Date: Thu, 12 Feb 2009 15:56:38 -0500
I don't have an easy solution with aggregate, because the function in
aggregate needs to return a scalar.
But the following should work:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))
loc sp tot
L1 L1 b 60
L2 L2 e 30
L3 L3 b 68
-Christos
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Monica Pisica
Sent: Thursday, February 12, 2009 1:58 PM
To: R help project
Subject: [R] Aggregrate function
Hi,
I have to recognize that i don't fully understand the
aggregate function, but i think it should help me with what i
want to do.
xveg is a data.frame with location, species, and total for
the species. Each location is repeated, once for every
species present at that location. For each location i want to
find out which species has the maximum total ... so i've
tried different ways to do it using aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d") tot <-
c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32) xveg <-
data.frame(loc, sp, tot)
result desired:
L1 b
L2 e
L3 b
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x)
levels(x)[which.max(table(x))])
This is wrong because it gives the first species name in each
level of location, so i get a, a, b, as species instead of b, e, b.
I've tried other few aggregate commands, all with wrong results.
I will appreciate any help,
Thanks,
Monica
_________________________________________________________________ the go. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________ of your life. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
aggregate and by are convenience functions of tapply. Consider this alternate solution: xveg[which(xveg$tot %in% with(xveg, tapply(tot, loc, max))),"sp"] It uses tapply to find the maximums by loc(ations) and then to goes back into xveg to find the corresponding sp(ecies). You should do testing to see whether the handling of ties agrees with your needs. -- David Winsemius On Feb 12, 2:56?pm, "Christos Hatzis" <christos.hat... at nuverabio.com> wrote:
I don't have an easy solution with aggregate, because the function in
aggregate needs to return a scalar.
But the following should work:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))
? ?loc sp tot
L1 ?L1 ?b ?60
L2 ?L2 ?e ?30
L3 ?L3 ?b ?68
-Christos
-----Original Message----- From: r-help-boun... at r-project.org [mailto:r-help-boun... at r-project.org] On Behalf Of Monica Pisica Sent: Thursday, February 12, 2009 1:58 PM To: R help project Subject: [R] Aggregrate function
Hi,
I have to recognize that i don't fully understand the aggregate function, but i think it should help me with what i want to do.
xveg is a data.frame with location, species, and total for the species. Each location is repeated, once for every species present at that location. For each location i want to find out which species has the maximum total ... so i've tried different ways to do it using aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d") tot <-
c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32) xveg <-
data.frame(loc, sp, tot)
result desired:
L1 ? b L2 ? e L3 ? b
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x) levels(x)[which.max(table(x))])
This is wrong because it gives the first species name in each level of location, so i get a, a, b, as species instead of b, e, b.
I've tried other few aggregate commands, all with wrong results.
I will appreciate any help,
Thanks,
Monica
_________________________________________________________________
?the go.
______________________________________________ R-h... at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I realized later that the which might not be necessary (and in addition was reminded privately). The %in% function returns a logical vector which works just as well with matrix or dataframe indexing as the numeric vector returned by which.
David Winsemius
On Feb 12, 2009, at 5:52 PM, David Winsemius wrote:
> aggregate and by are convenience functions of tapply.
>
> Consider this alternate solution:
>
> xveg[which(xveg$tot %in% with(xveg, tapply(tot, loc, max))),"sp"]
>
> It uses tapply to find the maximums by loc(ations) and then to goes
> back into xveg to find the corresponding sp(ecies). You should do
> testing to see whether the handling of ties agrees with your needs.
>
> --
> David Winsemius
>
> On Feb 12, 2:56 pm, "Christos Hatzis" <christos.hat... at nuverabio.com>
> wrote:
>> I don't have an easy solution with aggregate, because the function in
>> aggregate needs to return a scalar.
>> But the following should work:
>>
>> do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
>> x[which.max(x$tot), ]))
>>
>> loc sp tot
>> L1 L1 b 60
>> L2 L2 e 30
>> L3 L3 b 68
>>
>> -Christos
>>
>>
>>
>>> -----Original Message-----
>>> From: r-help-boun... at r-project.org
>>> [mailto:r-help-boun... at r-project.org] On Behalf Of Monica Pisica
>>> Sent: Thursday, February 12, 2009 1:58 PM
>>> To: R help project
>>> Subject: [R] Aggregrate function
>>
>>> Hi,
>>
>>> I have to recognize that i don't fully understand the
>>> aggregate function, but i think it should help me with what i
>>> want to do.
>>
>>> xveg is a data.frame with location, species, and total for
>>> the species. Each location is repeated, once for every
>>> species present at that location. For each location i want to
>>> find out which species has the maximum total ... so i've
>>> tried different ways to do it using aggregate.
>>
>>> loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
>>> c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d") tot <-
>>> c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32) xveg <-
>>> data.frame(loc, sp, tot)
>>
>>> result desired:
>>
>>> L1 b
>>> L2 e
>>> L3 b
>>
>>> sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x)
>>> levels(x)[which.max(table(x))])
>>
>>> This is wrong because it gives the first species name in each
>>> level of location, so i get a, a, b, as species instead of b, e, b.
>>
>>> I've tried other few aggregate commands, all with wrong results.
>>
>>> I will appreciate any help,
>>
>>> Thanks,
>>
>>> Monica
>>
>>> _________________________________________________________________
>>
>>> the go.
>>
>>> ______________________________________________
>>> R-h... at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/
>> listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hi again,
Thanks a lot for all the suggestions. It will take me a little bit to wrap my head around to understand what is what, though! This will help me quite a bit.
One difference in the result output between you're solution and Mark's solution is this:
loc sp tot
L1 L1 b 60
L2.5 L2 d 25
L2.7 L2 e 25
L3 L3 b 68
And Mark's solution:
loc sp tot
L1 L1 b 60
L2 L2 d,e 25
L3 L3 b 68
I will probably use both type of solutions depending what else i need to do with the data.
Thank you all for your help,
Monica
----------------------------------------
Date: Thu, 12 Feb 2009 14:05:44 -0800 From: spector at stat.berkeley.edu To: pisicandru at hotmail.com CC: christos.hatzis at nuverabio.com; r-help at r-project.org; markleeds at verizon.net Subject: Re: [R] Aggregrate function Monica - Here's a more compact version of the same idea: do.call(rbind,by(xveg,xveg['loc'],function(x)x[x$tot == max(x$tot),])) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 12 Feb 2009, Monica Pisica wrote:
Hi,
Thanks for the solution. Mark Leeds sent me privately a very similar solution. My next question to him was:
Suppose that for a certain location 2 species have the same maximum total ... (there are ties in the data for a particular location). How do i get all species that have that max. total??
For this case i have changed the tot as follows:
tot <- c(20, 60, 40, 15, 25, 15, 25, 20, 68, 32)
His sollution is (and does work):
temp <- lapply(split(xveg,loc), function(.df) {
maxindices <- which(.df$tot == .df$tot[which.max(.df$tot)])
data.frame(loc=.df$loc[1],sp=paste(.df$sp[maxindices],collapse=","),tot=max(.df$tot))
})
result <- do.call(rbind,temp)
print(result)
Thanks so much again,
Monica
From: christos.hatzis at nuverabio.com
To: pisicandru at hotmail.com; r-help at r-project.org
Subject: RE: [R] Aggregrate function
Date: Thu, 12 Feb 2009 15:56:38 -0500
I don't have an easy solution with aggregate, because the function in
aggregate needs to return a scalar.
But the following should work:
do.call("rbind", lapply(split(xveg, xveg$loc), function(x)
x[which.max(x$tot), ]))
loc sp tot
L1 L1 b 60
L2 L2 e 30
L3 L3 b 68
-Christos
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Monica Pisica
Sent: Thursday, February 12, 2009 1:58 PM
To: R help project
Subject: [R] Aggregrate function
Hi,
I have to recognize that i don't fully understand the
aggregate function, but i think it should help me with what i
want to do.
xveg is a data.frame with location, species, and total for
the species. Each location is repeated, once for every
species present at that location. For each location i want to
find out which species has the maximum total ... so i've
tried different ways to do it using aggregate.
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) sp <-
c("a", "b", "c", "a", "d", "b", "e", "c", "b", "d") tot <-
c(20, 60, 40, 15, 25, 10, 30, 20, 68, 32) xveg <-
data.frame(loc, sp, tot)
result desired:
L1 b
L2 e
L3 b
sp_maj <- aggregate(xveg[,2], list(xveg[,1], function(x)
levels(x)[which.max(table(x))])
This is wrong because it gives the first species name in each
level of location, so i get a, a, b, as species instead of b, e, b.
I've tried other few aggregate commands, all with wrong results.
I will appreciate any help,
Thanks,
Monica
_________________________________________________________________ the go. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________ of your life. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________ . 50F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008