Cl?ment,
First note that aggegate is not about atomic matrices
but about dataframes, i.e. not about atomic ff objects but about ffdf
objects.
The easiest thing to do - if yo have enough RAM - is just
working with few columns and read those into RAM as a standard
dataframe.
If you need to work with less RAM, instead of apply
functions for atomic ffs, you need to aggregate row chunks first, then
aggregate the aggregates.
Example below.
If you want to create a generic solution, in order to not reinvent a wheel here, it might be wise to look at package 'plyr'.
My understanding is that Hadley Wickham has thought carefully about how to break tasks into pieces and recombine the results.
I
have never tried to combine ff with plyr - go ahead. If a specific
feature in ff would be needed to make this possible, please let me know.
Jens Oehlschl?gel
# here is a simple aggregate example
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean)
# in order to aggregate chunked results we not only need the chunk means but also the number of valid observations
nmean <- function(x)c(mean=mean(x), nvalid=sum(!is.na(x)))
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, nmean)
# let's create a ffdf
library(ff)
ffair <- as.ffdf(airquality[sample(nrow(airquality)),])
# and define a chunking with two chunks (very small ones for demo here)
cs <- chunk(ffair, length=2)
# now we can apply our aggregate statement to each chunk
lapply(cs, function(i){
# aggregating the chunked results is nothing specific to ff
-----Urspr?ngliche Nachricht-----
Von: clement <clement.tisseuil at gmail.com>
Gesendet: Feb 10, 2011 3:29:21 PM
An: "R SIG High Performance Computing" <r-sig-hpc at r-project.org>
Betreff: [R-sig-hpc] ff: "aggregate" function for ff matrix ?
Hello,
Playing around the ff package, I wonder if there are some possibilities
to develop functions like "aggregate" based on the ffcolapply or ffapply
ff functions, which would split the big ff matrix into subsets according
to the different levels of a factorial vector, computes summary
statistics for each level, and returns the result in a ff object ?
Thanks in advance.
Regards
--
Cl?ment Tisseuil
_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
[https://stat.ethz.ch/mail]