Skip to content

Equivalent of 'first.var' or 'last.var' from SAS in R?

4 messages · Matthew Pettis, Peter Dalgaard, Hadley Wickham

#
Hi,

I want to sort a data frame by multiple columns and then take the
first record in each unique level of the "by" group I used to sort the
data frame.  Does someone have an example of how to do this?

Thanks,
Matt
#
Matthew Pettis wrote:
Something like this

 > aggregate(airquality,airquality["Month"],head,1)
  Month Ozone Solar.R Wind Temp Month Day
1     5    41     190  7.4   67     5   1
2     6    NA     286  8.6   78     6   1
3     7   135     269  4.1   84     7   1
4     8    39      83  6.9   81     8   1
5     9    96     167  6.9   91     9   1

where you probably want to lose the first column.

or

 > unsplit(lapply(split(aq,aq$Month), head,1),5:9)
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
32     NA     286  8.6   78     6   1
62    135     269  4.1   84     7   1
93     39      83  6.9   81     8   1
124    96     167  6.9   91     9   1

This also works, but the "tail" variant is harder:

 > unsplit(lapply(split(aq,aq$Month), "[",1,),5:9)
#
Thanks to Peter and Phil, this was indeed my idea.

On Thu, Sep 25, 2008 at 2:26 PM, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:

  
    
#
On Thu, Sep 25, 2008 at 2:00 PM, Matthew Pettis
<matthew.pettis at gmail.com> wrote:
In the (very soon to be released) plyr package, you can do:

library(plyr)
ddply(airquality, .(Month), head, 1)
ddply(airquality, .(Month), tail, 1)

Hadley