Skip to content
Prev 68462 / 398506 Next

large dataset import, aggregation and reshape

Christoph Lehmann a ??crit :
I would try the latter and use and SQL interface such as RODBC or 
RMySQL. You can send your aggregation and reshape commands to the 
external database as an SQL query.

Example with a database I have at hand. The table "datemesu" has 640,000 
rows and 5 columns, the field "mesure" being a factor with 2 levels, "N" 
and "P".

 > library(RODBC)
 > fil <- "C:/Archives/Baobab/Baobab2000.mdb"
 > chann <- odbcConnectAccess(fil)
 > quer <- paste("SELECT numani, SUM(IIF(mesure = 'P', 1, 0)) AS wt,",
+                      "SUM(IIF(mesure = 'N', 1, 0)) AS bcs,",
+                      "MIN(date) AS minDate",
+               "FROM datemesu",
+               "GROUP BY numani")
 > system.time(tab <- sqlQuery(chann, quer), gcFirst = TRUE)
[1] 11.16  0.19 11.54    NA    NA
 > odbcCloseAll()
 >
 > dim(tab)
[1] 69360     4
 > head(tab)
        numani wt bcs    minDate
1 SNFLCA00001  1   0 1987-01-23
2 SNFLCA00002  2   0 1987-01-10
3 SNFLCA00004  1   0 1987-01-10
4 SNFLCA00006  4   0 1987-02-02
5 SNFLCA00007  4   0 1987-02-18
6 SNFLCA00008  3   0 1987-01-09


Best,

Renaud