Skip to content

Find all numbers in a certain interval

8 messages · Antje, Duncan Murdoch, David Winsemius +2 more

#
Hi all,

I'd like to know, if I can solve this with a shorter command:

a <- rnorm(100)
which(a > -0.5 & a < 0.5)

# would give me all indices of numbers greater than -0.5 and smaller than +0.5

I have something similar with a dataframe and it produces sometimes quite long 
commands...
I'd like to have something like:

which(within.interval(a, -0.5, 0.5))

Is there anything I could use for this purpose?


Antje
#
Antje wrote:
Not in general, but in this particular case "abs(a) < 0.5" gives you the 
right result.

By the way, some advice I read many years ago (in Kernighan and 
Plauger):  always use < or <=, avoid > or >= in multiple comparisons.  
It's easier to read

-0.5 < a & a < 0.5

than it is to read the form you used, because it is so much like the 
math notation -0.5 < a < 0.5.

Duncan Murdoch
#
It's not entirely clear what you are asking for, since  
which(within.interval(a, -0.5, 0.5)) is actually longer than which(a >  
-0.5 & a < 0.5). You mention that you want a solution that applies to  
dataframes. Using indexing you can get entire rows of dataframes that  
satisfy multiple conditions on one of its columns:

 >> DF <- data.frame(a = rnorm(20), b= LETTERS[1:20], c =  
letters[20:1], stringsAsFactors=FALSE)

 > DF[which( DF$a > -0.5 & DF$a < 0.5 ), ]
   # note that one needs to avoid DF[which(a > -0.5 & a<0.5) , ]
   # the "a" vector is not the same as the "a" column vector within DF
              a b c
3  -0.47310672 C r
6  -0.49784460 F o
9   0.02571058 I l
10  0.16893759 J k
11 -0.11963322 K j
12  0.39378887 L i
16  0.03712263 P e

Could get the indices that satisfy more than one condition:
 > which(DF$a > 0.5 & DF$b < "K")
[1]  1  2  6 10

Or you can get rows of DF that satisfy conditions on multiple columns  
with the subset function:

 > subset(DF, a > 0.5 & b < "K")
            a b c
1  2.2500997 A t
2  0.7251357 B s
6  0.7845355 F o
10 1.0685649 J k

Or if you wanted a within.interval function

 > within.interval <- function(x,a,b) { x > a & x < b}

 > which(within.interval(DF$a, -0.5, 0.5))
[1]  3  4  7  8  9 13 14 17 20
#
Hi David,

thanks a lot for your proposal. I got a lot of useful hints from all of you :-)

David Winsemius schrieb:
Right but in case 'a' is something with a long name and '0.5' is a variable you 
might end up with something like this (for the data frame example):

DF[which( DF$myReallyLongColumnName > -myReallyLongThreshold & 
DF$myReallyLongColumnName < -myReallyLongThreshold ), ]

instead of:

DF[which( within.interval(DF$myReallyLongColumnName, myReallyLongThreshold), ]

You mention that you want a solution that applies to
#
On Dec 16, 2008, at 7:19 AM, Antje wrote:

            
I see your point, but I must point out that no cases would ever  
satisfy that construction.
That would be a different within.interval function than I suggested,  
but you could certainly create one which accepted a vector.

within.interval <- function(x, y) { min(y) < x & x < max(y) }
----------
 > within.interval2 <- function(x,y) { min(y) < x & x < max(y)}

 > y <- c(-.1, -.2, .1,.2)

 > which(within.interval2(DF$a,y))
[1]  7 13 14 17
#
Here are a couple of function definitions that may be more intuitive for some people (see the examples below the function defs).  They are not perfect, but my tests showed they work left to right, right to left, outside in, but not inside out.

`%<%` <- function(x,y) {
        xx <- attr(x,'orig.y')
        yy <- attr(y,'orig.x')

        if(is.null(xx)) {
                xx <- x
                x <- rep(TRUE, length(x))
        }
        if(is.null(yy)) {
                yy <- y
                y <- rep(TRUE, length(y))
        }

        out <- x & y & (xx < yy)
        attr(out, 'orig.x') <- xx
        attr(out, 'orig.y') <- yy

        out
}

`%<=%` <- function(x,y) {
        xx <- attr(x,'orig.y')
        yy <- attr(y,'orig.x')

        if(is.null(xx)) {
                xx <- x
                x <- rep(TRUE, length(x))
        }
        if(is.null(yy)) {
                yy <- y
                y <- rep(TRUE, length(y))
        }

        out <- x & y & (xx <= yy)
        attr(out, 'orig.x') <- xx
        attr(out, 'orig.y') <- yy

        out
}




x <- -3:3

 -2 %<% x %<% 2
c( -2 %<% x %<% 2 )
x[ -2 %<% x %<% 2 ]
x[ -2 %<=% x %<=% 2 ]


x <- rnorm(100)
y <- rnorm(100)

x[ -1 %<% x %<% 1 ]
range( x[ -1 %<% x %<% 1 ] )


cbind(x,y)[ -1 %<% x %<% y %<% 1, ]
cbind(x,y)[ (-1 %<% x) %<% (y %<% 1), ]
cbind(x,y)[ ((-1 %<% x) %<% y) %<% 1, ]
cbind(x,y)[ -1 %<% (x %<% (y %<% 1)), ]
cbind(x,y)[ -1 %<% (x %<% y) %<% 1, ] # oops

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
#
Hi, 
If you can formulate your question it in terms of actual problem you have with data.frame it would be easier to answer.

for the time being check subset() if it is what you want.

SV.
On Tue, 16 Dec 2008 11:09:19 +0100, Antje <niederlein-rstat at yahoo.de> wrote:

            
#
Thanks a lot for every answer I got!
I could solve my problem!

Greg, your proposal seems to be quite useful for me :-) Thank you.

Ciao,
Antje



Antje schrieb: