Skip to content

help with mysql and R: partitioning by quintile

9 messages · jim holtman, Phil Spector, Dennis Murphy +1 more

#
try this:
+               , track = rep(1:20, 20)
+               , freq = floor(runif(400, 10, 200))
+               , stringsAsFactors = FALSE
+               )
+               list(userid = userid
+                  , freq = freq
+                  , rating = findInterval(freq
+                                        # use track as index into
quantile matrix
+                                        , tqm[as.character(track[1L]),]
+                                        , rightmost.closed = TRUE
+                                        ) + 1L
+                  )
+              , by = track]
track userid freq rating
[1,]     1     u1   10      1
[2,]     1     u2   15      1
[3,]     1     u3  126      4
[4,]     1     u4  117      3
[5,]     1     u5   76      2
[6,]     1     u6  103      3

        
On Sun, May 8, 2011 at 2:48 PM, gj <gawesh at gmail.com> wrote:

  
    
#
One way to get the ratings would be to use the ave() function:

rating = ave(x$freq,x$track,
              FUN=function(x)cut(x,quantile(x,(0:5)/5),include.lowest=TRUE))

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Sun, 8 May 2011, gj wrote:

            
5 days later
#
Hi:

Is this what you're after?

tq <- with(ds, quantile(freq, seq(0.2, 1, by = 0.2)))
ds$int <- with(ds, cut(freq, c(0, tq)))
with(ds, table(int))

int
 (0,1]  (1,2]  (2,4]  (4,7] (7,16]
    10      6      7      6      6

HTH,
Dennis
On Sat, May 14, 2011 at 9:42 AM, gj <gawesh at gmail.com> wrote:
#
An easy way is to just offset the quantiles by a small increment so
that boundary condition is less likely.  If you change the line

tqm <- do.call(rbind, tq) + 0.001

in my example, that should do the trick.
On Sat, May 14, 2011 at 6:09 PM, gj <gawesh at gmail.com> wrote: