selecting maximum values
On 04-May-05 Roger Bivand wrote:
On Wed, 4 May 2005, Sean Davis wrote:
see ?aggregate.
Or maybe tapply, or its close relative, by:
by(df, list(df$station, df$date), function(x)
+ x$row[which.max(x$chlorophyll)]) : Ancona : 21/06/01 [1] NA ------------------------------------------------------------ : Castagneto : 21/06/01 [1] 3 ------------------------------------------------------------ : Ancona : 23/06/01 [1] 6 ------------------------------------------------------------ : Castagneto : 23/06/01 [1] NA since happily a row ID column was included in the data frame. Note that which.max only reports the row of the first maximum if there are ties.
I've tried to work out a method which gives a cleaner result
(for instance, the NAs are ugly and unnecessary).
I've called Alessandro's data (below) "chl" (for chlorophyll),
and using Roger's command above assign the result to "tmp":
tmp<-by(chl, list(chl$station, chl$date),
function(x) x$row[which.max(x$chlorophyll)] )
Then, using either tmp[1:2,] or tmp[,1:2] we get
tmp[,1:2]
## 21/06/01 23/06/01
## Ancona NA 6
## Castagneto 3 NA
which is a better layout but still has the NAs.
It would be better to be able to get something like
## Ancona 23/06/01 6
## Castagneto 21/06/01 3
but I don't see how to do it even for just these 2 stations.
Now, however, suppose we want not just the rows but the values
as well. Try a modified function
tmp<-by(chl, list(chl$station, chl$date),
function(x) list(Row=x$row[which.max(x$chlorophyll)],
Val=max(x$chlorophyll))
)
Now
str(tmp)
## List of 4
## $ : NULL
## $ :List of 2
## ..$ Row: int 3
## ..$ Val: num 2.4
## $ :List of 2
## ..$ Row: int 6
## ..$ Val: num 2.5
## $ : NULL
## - attr(*, "dim")= int [1:2] 2 2
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:2] "Ancona" "Castagneto"
## ..$ : chr [1:2] "21/06/01" "23/06/01"
## - attr(*, "call")= language by.data.frame(data = chl, INDICES =
## list(chl$station, chl$date), FUN = function(x) list(Row =
## x$row[which.max(x$chlorophyll)], ...
## - attr(*, "class")= chr "by"
I've not succeeded (though experience tells me that others could)
in extracting from this something like the following:
## Ancona Castagneto
##Row 6 3
##Val 2.5 2.4
##Date 23/06/01 21/06/01
Questions: (a) What's the trick? (b) How to generalise it?
Ted.
Sean On May 4, 2005, at 11:43 AM, alessandro carletti wrote:
Sorry for disturbing you with another newbie question! I have a data frame about coastal waters quality parameters: for some parameters (e.g. NH3) I have only 1 observation for each sampling station and each sampling date, while in other cases (chlorophyll) I have 1 obs for each meter-depth for each station and date. How can I select only the max chlorophyll value for each station/date? example row station date depth chlorophyll 1 Castagneto 21/06/01 -0.5 2.0 2 Castagneto 21/06/01 -1.5 2.2 3 Castagneto 21/06/01 -2.5 2.4 4 Castagneto 21/06/01 -3.5 2.1 5 Ancona 23/06/01 -0.5 2.4 6 Ancona 23/06/01 -1.5 2.5 7 Ancona 23/06/01 -2.5 2.2 8 Ancona 23/06/01 -3.5 2.1 9 Ancona 23/06/01 -4.5 1.9 ... I'd like to select only row 3 and 6, the ones with max chlorophyll values, or have the mean for the rows 1:4 and 5:9 Thanks
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 05-May-05 Time: 14:13:13 ------------------------------ XFMail ------------------------------