Fast Normalize by Group

try the 'data.table' package.  Takes about 0.1 seconds to normalize the data.
x <- data.frame(id = sample(10000, 100000, TRUE), value = runif(100000))
require(data.table)
Loading required package: data.table
data.table 1.8.2  For help type: help("data.table")
system.time({
+     x <- data.table(x)
+     newX <- x[
+         , list(value = value  # keep original value
+             , normValue = value / sum(value)
+             )
+         , by = id
+         ]
+ })
   user  system elapsed
   0.03    0.01    0.11
head(newX, 20)
id     value   normValue
 1: 8094 0.6805425 0.101140797
 2: 8094 0.3154233 0.046877543
 3: 8094 0.8998646 0.133735993
 4: 8094 0.8858863 0.131658564
 5: 8094 0.1859526 0.027635892
 6: 8094 0.4694456 0.069768023
 7: 8094 0.9302886 0.138257544
 8: 8094 0.7482040 0.111196505
 9: 8094 0.9052426 0.134535255
10: 8094 0.4650028 0.069107739
11: 8094 0.2428116 0.036086145
12: 6287 0.1979209 0.037505820
13: 6287 0.5117723 0.096980353
14: 6287 0.6425769 0.121767688
15: 6287 0.0397795 0.007538177
16: 6287 0.1255722 0.023795811
17: 6287 0.5606742 0.106247214
18: 6287 0.4818579 0.091311594
19: 6287 0.3913614 0.074162596
20: 6287 0.4622984 0.087605098

Hi,

I have a very large data set (aprox. 100,000 rows.)

The data comes from around 10,000 "groups" with about 10 entered per group.

The values are in one column, the group ID is an integer in the second column.

I want to normalize the values by group:

for(g in unique(groups){
        x[group==g] / sum(x[group==g])
}

This works find in a loop, but is slow.  Is there a faster way to do this?

Thanks!
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Fast Normalize by Group

Thread (8 messages)