How to replicate SAS by group processing in R
On Oct 10, 2012, at 11:09 AM, ramoss wrote:
Hello, I am trying to re-code all my programs from SAS into R. In SAS I use the following code: proc sort data=upper; by tdate stock_symbol expire strike; run; data upper1; set upper; by tdate stock_symbol expire strike;
I must have forgotten my SAS. (It was a lng time ago I will admit.) Would that have succeeded with the inclusion of 'strike' in that 'by' list?
if first.expire then output; rename strike=astrike; run; on the following data set: tdate stock_symbol expiration strike 9/11/2012 C 9/16/2012 11 9/11/2012 C 9/16/2012 12 9/11/2012 C 9/16/2012 13 9/12/2012 C 9/16/2012 14 9/12/2012 C 9/16/2012 15 9/12/2012 C 9/16/2012 16 9/12/2012 C 9/16/2012 17 to get the following results: tdate stock_symbol expiration strike 9/11/2012 C 9/16/2012 11 9/12/2012 C 9/16/2012 14
dat[tapply(1:nrow(dat), list( dat$stock_symbol, dat$tdate), FUN= function(x) head(x,1) ), ]
tdate stock_symbol expiration strike 1 9/11/2012 C 9/16/2012 11 4 9/12/2012 C 9/16/2012 14
How would I replicate this kind of logic in R? I have seen PLY & data.table packages mentioned but don't see how they would do the job.
You must mean the 'plyr' package; there is no "PLY'. I'm sure the 'ddply' function or data.table could do this. Here's another way with the R 'by' function which is then row-bound using 'do.call':
do.call( rbind, by(dat, list( dat$stock_symbol, dat$tdate), FUN= function(x) head(x,1) ) )
tdate stock_symbol expiration strike 1 9/11/2012 C 9/16/2012 11 4 9/12/2012 C 9/16/2012 14
David Winsemius, MD Alameda, CA, USA