Skip to content

function to filter identical data.fames using less than (<) and greater than (>)

7 messages · Rui Barradas, Jeff Newmiller, Karl Brand

#
Esteemed UseRs,

I've got many biggish data frames which need a lot subsetting, like in 
this example:

# example
eg <- data.frame(A = rnorm(10), B = rnorm(10), C = rnorm(10), D = rnorm(10))
egsub <- eg[eg$A < 0 & eg$B < 1 & eg$C > 0, ]
egsub
egsub2 <- eg[eg$A > 1 & eg$B > 0, ]
egsub2

# To make this clearer than 1000s of lines of extractions with []
# I tried to make a function like this:

# func(data="eg", A="< 0", B="< 1", C="> 0")

# Which would also need to be run as

# func(data="eg", A="> 1", B="> 0", C=NA)
#end

Noteably:
-the signs* "<" and ">" need to be flexible _and_ optional
-the quantities also need to be flexible
-column header names i.e, A, B and C don't need flexibility,
i.e., can remain fixed
* "less than" and "greater than" so google picks up this thread

Once again i find just how limited my grasp of R is...Is do.call() the 
best way to call binary operators like < & > in a function? Is an ifelse 
statement needed for each column to make filtering on it optional? etc....

Any one with the patience to show their working version of such a 
funciton would receive my undying Rdulation. With thanks in advance,

Karl
#
Hello,

Something like this?


func <- function(data, A, B, C){
     f <- function(a)
         function(x) eval(parse(text = paste("x", a)))
     iA <- if(is.na(A)) TRUE else f(A)(data$A)
     iB <- if(is.na(B)) TRUE else f(B)(data$B)
     iC <- if(is.na(C)) TRUE else f(C)(data$C)
     data[iA & iB & iC, ]
}

func(eg, "> 0", NA, NA)
func(data=eg, A="< 0", B="< 1", C="> 0")


Hope this helps,

Rui Barradas
Em 06-12-2012 13:49, Karl Brand escreveu:
#
You have not indicated why the subset function is insufficient for your needs...
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Karl Brand <k.brand at erasmusmc.nl> wrote:

            
#
Rui,

Indeed it does help. Also very happy to see eval() and parse() employed 
and demystified here.

In Rdulation,

Karl
On 06/12/12 15:48, Rui Barradas wrote:

  
    
#
Hi Jeff,

Subset is indeed what's reuqired here. But using it every time it's 
needed was generating excessive amounts of obtuse code. So for the sake 
of clarity and convenience i wanted a wrapper function to replace these 
repetitious subsets.

Although Rui's example works just fine, love to see any idiomatic ways 
you might attempt this (also for the sake of improving my grasp of R).

Cheers,

Karl
On 06/12/12 15:57, Jeff Newmiller wrote:

  
    
#
You ask me to provide code when you have only described your solution rather than your problem. That limits my options more than I care to allow for investing my time.

When I think of problems that require repetitive subsetting I tend to look for solutions involving aggregation (?aggregate, ?plyr::ddply), which requires creating one or more grouping columns which can be formulated with the cut function or with logical indexed assignment (e.g. a sequence of statements something like eg$grpcol[with(eg,grpcol!="Default" & A<1 & B<1)] <- "ABTooLow").

So... what is your problem?
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Karl Brand <k.brand at erasmusmc.nl> wrote:

            
#
My problem is that using "[" every time i want to extract my data of 
interest is cumbersome and verbose for the next guy suffering through 
reading my code. Since my extractions are always on the same columns and 
depend on either "<", ">" or neither, a wrapper function or perhaps 
different function besides "[" will likely solve my problem. Indeed 
Rui's example achieves exactly what i wanted. Keep in mind my grasp of R 
remains limited and you might think my problem is more complex than it 
is. So i'm only inviting further solution's to this problem for the sake 
of improving my grasp of R. You certainly have my understanding should 
this go beyond what you might invest your time in :)

No less, your example code:

eg$grpcol[with(eg,grpcol!="Default" & A<1 & B<1)] <- "ABTooLow")

already provides educational material for me, thank you.

Chrs, K
On 06/12/12 18:00, Jeff Newmiller wrote: