Skip to content

Fast Normalize by Group

8 messages · Noah Silverman, Peter Langfelder, Mikołaj Hnatiuk +4 more

#
Hi,

I have a very large data set (aprox. 100,000 rows.)

The data comes from around 10,000 "groups" with about 10 entered per group.

The values are in one column, the group ID is an integer in the second column.

I want to normalize the values by group:

for(g in unique(groups){
	x[group==g] / sum(x[group==g])
}

This works find in a loop, but is slow.  Is there a faster way to do this?

Thanks!
#
Not tested but should work:

sums = tapply(x, group, sum);
sums.ext = sums[ match(group, names(sums))]
normalized = x/sums.ext

It may be that the tapply is just as slow as your loop though, I'm not sure.

HTH,

Peter
On Thu, Nov 29, 2012 at 10:55 AM, Noah Silverman <noahsilverman at ucla.edu> wrote:
#
Hello,

If yopu want one value per group use tapply(), if you want one value per 
value of x use ave()

tapply(x, group, FUN = function(.x) .x/sum(.x))
ave(x, group, FUN = function(.x) .x/sum(.x))


Hope this helps,

Rui Barradas
Em 29-11-2012 18:55, Noah Silverman escreveu:
#
try the 'data.table' package.  Takes about 0.1 seconds to normalize the data.
Loading required package: data.table
data.table 1.8.2  For help type: help("data.table")
+     x <- data.table(x)
+     newX <- x[
+         , list(value = value  # keep original value
+             , normValue = value / sum(value)
+             )
+         , by = id
+         ]
+ })
   user  system elapsed
   0.03    0.01    0.11
id     value   normValue
 1: 8094 0.6805425 0.101140797
 2: 8094 0.3154233 0.046877543
 3: 8094 0.8998646 0.133735993
 4: 8094 0.8858863 0.131658564
 5: 8094 0.1859526 0.027635892
 6: 8094 0.4694456 0.069768023
 7: 8094 0.9302886 0.138257544
 8: 8094 0.7482040 0.111196505
 9: 8094 0.9052426 0.134535255
10: 8094 0.4650028 0.069107739
11: 8094 0.2428116 0.036086145
12: 6287 0.1979209 0.037505820
13: 6287 0.5117723 0.096980353
14: 6287 0.6425769 0.121767688
15: 6287 0.0397795 0.007538177
16: 6287 0.1255722 0.023795811
17: 6287 0.5606742 0.106247214
18: 6287 0.4818579 0.091311594
19: 6287 0.3913614 0.074162596
20: 6287 0.4622984 0.087605098

        
On Thu, Nov 29, 2012 at 1:55 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:

  
    
#
On 29-11-2012, at 19:55, Noah Silverman wrote:

            
Toy example:

gx <- data.frame(group=rep(1:4,each=3), x=1:12)
gx
gx$x <- ave(gx$x, gx$group, FUN=function(x) x/sum(x))
gx


Berend
#
HI All,

I am very new to R tool. Can some one please suggest me some tutorial 
links for understanding SVM using R.

Regards,
Vivek