loop in a data.table
Hi,
On Wed, Mar 13, 2013 at 7:25 PM, Camilo Mora <cmora at dal.ca> wrote:
Hi everyone,
I have a data.table called "data" with many columns which I want to group by
column1 using data.table, given how fast it is.
The problem with looping a data.table is that data.table does not like
quotations to define the column names (e.g. "col2" instead of col2). I
found a way around which is to use get("col2"), which works fine but the
processing time multiples by 20.
So if I use:
data[,sum(col2),by=(key)]
entering the column names by hand, the operation is done in 1 sec. but if in
the contrary I use:
data[,sum(get("col2")),by=(key)]
using a loop to put the column names, the same operation takes 20 sec. I
cannot use the former code because I have 100000 files to process but the
later will simply take months to complete. Is there any alternative to the
function "get" or any other way in which data.table con recognize the names
of the columns?.
I'm still not sure what you're trying to do. Could you maybe create an example that's a bit closer to you real data and the stuff you want to do on it? Are all the columns of the same type? Are you just summing columns? If you post code into an email that reconstructions a small version of your data.table (maybe 5-10 columns and one or two groups) it'd be more clear for me. Thanks, -steve
Steve Lianoglou Defender of The Thesis | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact