Back to formatted view
Raw Message

Message-ID: <XFMail.050505141700.Ted.Harding@nessie.mcc.ac.uk>
Date: 2005-05-05T13:17:00Z
From: (Ted Harding)
Subject: selecting maximum values
In-Reply-To: <Pine.LNX.4.44.0505041925580.23757-100000@reclus.nhh.no>

On 04-May-05 Roger Bivand wrote:
> On Wed, 4 May 2005, Sean Davis wrote:
> 
>> see ?aggregate.
> 
> Or maybe tapply, or its close relative, by:
> 
>> by(df, list(df$station, df$date), function(x) 
> +   x$row[which.max(x$chlorophyll)]) 
>: Ancona
>: 21/06/01
> [1] NA
> ------------------------------------------------------------ 
>: Castagneto
>: 21/06/01
> [1] 3
> ------------------------------------------------------------ 
>: Ancona
>: 23/06/01
> [1] 6
> ------------------------------------------------------------ 
>: Castagneto
>: 23/06/01
> [1] NA
> 
> since happily a row ID column was included in the data frame. Note that
> which.max only reports the row of the first maximum if there are ties.

I've tried to work out a method which gives a cleaner result
(for instance, the NAs are ugly and unnecessary).

I've called Alessandro's data (below) "chl" (for chlorophyll),
and using Roger's command above assign the result to "tmp":

tmp<-by(chl, list(chl$station, chl$date),
        function(x) x$row[which.max(x$chlorophyll)] )

Then, using either tmp[1:2,] or tmp[,1:2] we get

  tmp[,1:2]
  ##            21/06/01 23/06/01
  ## Ancona           NA        6
  ## Castagneto        3       NA

which is a better layout but still has the NAs.

It would be better to be able to get something like

  ## Ancona     23/06/01        6
  ## Castagneto 21/06/01        3

but I don't see how to do it even for just these 2 stations.

Now, however, suppose we want not just the rows but the values
as well. Try a modified function

  tmp<-by(chl, list(chl$station, chl$date),
          function(x) list(Row=x$row[which.max(x$chlorophyll)],
                           Val=max(x$chlorophyll))
         )

Now

  str(tmp)
  ## List of 4
  ##  $ : NULL
  ##  $ :List of 2
  ##   ..$ Row: int 3
  ##   ..$ Val: num 2.4
  ##  $ :List of 2
  ##   ..$ Row: int 6
  ##   ..$ Val: num 2.5
  ##  $ : NULL
  ##  - attr(*, "dim")= int [1:2] 2 2
  ##  - attr(*, "dimnames")=List of 2
  ##   ..$ : chr [1:2] "Ancona" "Castagneto"
  ##   ..$ : chr [1:2] "21/06/01" "23/06/01"
  ##  - attr(*, "call")= language by.data.frame(data = chl, INDICES =
  ##  list(chl$station, chl$date),      FUN = function(x) list(Row =
  ## x$row[which.max(x$chlorophyll)],  ...
  ##  - attr(*, "class")= chr "by"

I've not succeeded (though experience tells me that others could)
in extracting from this something like the following:

  ##        Ancona Castagneto 
  ##Row          6          3 
  ##Val        2.5        2.4 
  ##Date  23/06/01   21/06/01

Questions: (a) What's the trick? (b) How to generalise it?

Ted.

> 
>> 
>> Sean
>> 
>> On May 4, 2005, at 11:43 AM, alessandro carletti wrote:
>> 
>> > Sorry for disturbing you with another newbie question!
>> > I have a data frame about coastal waters quality
>> > parameters: for some parameters (e.g. NH3) I have only
>> > 1 observation for each sampling station and each
>> > sampling date, while in other cases (chlorophyll) I
>> > have 1 obs for each meter-depth for each station and
>> > date. How can I select only the max chlorophyll value
>> > for each station/date?
>> >
>> > example
>> >
>> > row  station         date        depth     chlorophyll
>> > 1     Castagneto      21/06/01     -0.5         2.0
>> > 2     Castagneto      21/06/01     -1.5         2.2
>> > 3     Castagneto      21/06/01     -2.5         2.4
>> > 4     Castagneto      21/06/01     -3.5         2.1
>> > 5     Ancona          23/06/01     -0.5         2.4
>> > 6     Ancona          23/06/01     -1.5         2.5
>> > 7     Ancona          23/06/01     -2.5         2.2
>> > 8     Ancona          23/06/01     -3.5         2.1
>> > 9     Ancona          23/06/01     -4.5         1.9
>> > ...
>> >
>> > I'd like to select only row 3 and 6, the ones with max
>> > chlorophyll values, or have the mean for the rows 1:4
>> > and 5:9
>> >
>> > Thanks


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-May-05                                       Time: 14:13:13
------------------------------ XFMail ------------------------------