Summary information by groups programming assitance
The sorting should have been by Lake, psd and vol (not what I had)
so it should be revised to:
DFo <- DF[order(DF$Lake, DF$psd, DF$vol), ]
aggregate(DFo[c("Length", "vol")], DFo[c("Lake", "psd")], tail, 1)
This is the same as before except DF$psd is used in place of DF$Length
in the first line.
On Mon, Dec 22, 2008 at 9:14 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
Just sort the data first and then apply any of the solutions but with tail(x, 1)
instead of max, e.g.
DFo <- DF[order(DF$Lake, DF$Length, DF$vol), ]
aggregate(DFo[c("Length", "vol")], DFo[c("Lake", "psd")], tail, 1)
On Mon, Dec 22, 2008 at 8:15 PM, Ranney, Steven
<steven.ranney at montana.edu> wrote:
Thank you all for your help. I appreciate the assistance. I'm thinking I should have been more specific in my original question. Unless I'm mistaken, all of the suggestions so far have been for maximum vol and maximum Length by Lake and psd. I'm trying to extract the max vol by Lake and psd along with the corresponding value of Length. So, instead of maximum vol and maximum Length, I'd like to find the max vol and the Length associated with that value. Sorry for any confusion, SR Steven H. Ranney Graduate Research Assistant (Ph.D) USGS Montana Cooperative Fishery Research Unit Montana State University P.O. Box 173460 Bozeman, MT 59717-3460 phone: (406) 994-6643 fax: (406) 994-7479 http://studentweb.montana.edu/steven.ranney
________________________________
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
Sent: Mon 12/22/2008 5:15 PM
To: Ranney, Steven
Cc: r-help at r-project.org
Subject: Re: [R] Summary information by groups programming assitance
Here are two solutions assuming DF is your data frame:
# 1. aggregate is in the base of R
aggregate(DF[c("Length", "vol")], DF[c("Lake", "psd")], max)
or the following which is the same except it labels psd as Category:
aggregate(DF[c("Length", "vol")], with(DF, list(Lake = Lake, Category
= psd)), max)
# 2. sqldf. The sqldf package allows specification using SQL notation:
library|(sqldf)
sqldf("select Lake, psd as Category, max(Length), max(vol) from DF
group by Lake, psd")
There are many other good solutions too using various packages which
have already
been mentioned on this thread.
On Mon, Dec 22, 2008 at 4:51 PM, Ranney, Steven
<steven.ranney at montana.edu> wrote:
All -
I have data that looks like
psd Species Lake Length Weight St.weight Wr
Wr.1 vol
432 substock SMB Clear 150 41.00 0.01 95.12438
95.10118 0.0105
433 substock SMB Clear 152 39.00 0.01 86.72916
86.70692 0.0105
434 substock SMB Clear 152 40.00 3.11 88.95298
82.03689 3.2655
435 substock SMB Clear 159 48.00 0.04 92.42095
92.34393 0.0420
436 substock SMB Clear 159 48.00 0.01 92.42095
92.40170 0.0105
437 substock SMB Clear 165 47.00 0.03 80.38023
80.32892 0.0315
438 substock SMB Clear 171 62.00 0.21 94.58105
94.26070 0.2205
439 substock SMB Clear 178 70.00 0.01 93.91912
93.90571 0.0105
440 substock SMB Clear 179 76.00 1.38 100.15760
98.33895 1.4490
441 S-Q SMB Clear 180 75.00 0.01 97.09330
97.08035 0.0105
442 S-Q SMB Clear 180 92.00 0.02 119.10111
119.07522 0.0210
...
[truncated]
where psd and lake are categorical variables, with five and four
categories, respectively. I'd like to find the maximum vol and the
lengths associated with each maximum vol by each category by each lake.
In other words, I'd like to have a data frame that looks something like
Lake Category Length vol
Clear substock 152 3.2655
Clear S-Q 266 11.73
Clear Q-P 330 14.89
...
Pickerel substock 170 3.4965
Pickerel S-Q 248 10.69
Pickerel Q-P 335 25.62
Pickerel P-M 415 32.62
Pickerel M-T 442 17.25
In order to originally get this, I used
with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
and pulled the values I needed out by hand and put them into a .csv.
Unfortunately, I've got a number of other data sets upon which I'll need
to do the same analysis. Finding a programmable alternative would
provide a much easier (and likely less error prone) method to achieve
the same results. Ideally, the "Length" and "vol" data would be in a
data frame such that I could then analyze with nls.
Does anyone have any thoughts as to how I might accomplish this?
Thanks in advance,
Steven Ranney
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.