searching for data.frame rows / processing of rows

I tought that aggregate was the way to go, but only for large dataframes it is 
faster.
  df <- read.table(stdin(),header=TRUE)
0: Location   Time    X    Y
1:   1          0      1.6  9.3
2:   1          3      4.2  10.4
3:   1          6      2.7  16.3
4:   2          0      0.5  2.1
5:   2          3      NA   3.6
6:   2          3      5.0  0.06
7:   2          6      3.4  14.0
8:
aggregate(df[,3:4],df[,1:2],FUN=mean,na.rm=TRUE)
Location Time   X     Y
1        1    0 1.6  9.30
2        2    0 0.5  2.10
3        1    3 4.2 10.40
4        2    3 5.0  1.83
5        1    6 2.7 16.30
6        2    6 3.4 14.00
 system.time(  aggregate(df[,3:4],df[,1:2],FUN=mean,na.rm=TRUE)  )
[1] 0.008 0.000 0.008 0.000 0.000
 system.time(  do.call(rbind, by(df, df[2:1], colMeans, na.rm = TRUE)))
[1] 0.005 0.000 0.005 0.000 0.000

df <- data.frame(Location=rep(1:50,50),
+                  Time=sample(rep(1:50,each=10),2500,replace=TRUE),
+                  X=runif(2500),Y=runif(2500))
 system.time(  aggregate(df[,3:4],df[,1:2],FUN=mean,na.rm=TRUE)  )
[1] 0.162 0.000 0.163 0.000 0.000
 system.time(  do.call(rbind, by(df, df[2:1], colMeans, na.rm = TRUE)))
[1] 2.179 0.006 2.216 0.000 0.000

Kees
Grouping the data frame by the first two columns, apply colMeans
and then rbind the resulting by-structure together:

do.call(rbind, by(DF, DF[2:1], colMeans, na.rm = TRUE))

On 10/5/06, Greg Tarpinian <sasprog474 at yahoo.com> wrote:
R 2.3.1, WinXP:

I have a puzzling problem that I suspect may be solved using
grep or a regular expression but am at a loss how to actually do it...
My data.frame looks like

 Location   Time    X    Y
 --------   ----   ---  ---
 1          0      1.6  9.3
 1          3      4.2  10.4
 1          6      2.7  16.3
 2          0      0.5  2.1
 2          3      NA   3.6
 2          3      5.0  0.06
 2          6      3.4  14.0

and so forth.  I would like to search for duplicate Time values
within a Location and take the numerical average (where possible)
of the elements in X and Y.  These numerical averages should
then be used to create a single row where multiple rows once
existed.  So, I would like to obtain

 2          3      5.0  1.83

for the two Time = 3 rows for Location = 2 and use it to replace
these two rows.  Ideally, avoiding for(i in 1:blah) loops would be
nice because the data.frame has about 10,000 rows that need to
be searched and processed.  My intent is to do some comparing of
SAS to R -- the DATA step processing in SAS is quite fast and
using the RETAIN statement along with the LAG( ) function allows
this sort of thing to be done rapidly.

Thanks in advance,

      Greg

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.