Skip to content

Classifying values by interval

5 messages · Dimitris Rizopoulos, (Ted Harding), Jim Lemon

#
Greetings All!
As is often the case on this list, the answer may well
be under my nose but I can't see it!

I am looking for a "smart" way to do the following.

Say I have a vector of values, X. I set up bins" for X,
say with breaks at B = c(b1,b2,...,b11) covering the
range of X, i.e. bins numbered 1:10. The value x is in
bin i if B[i] < x <= B[i+1]

What I seek is a vector, of the same length as X, which
for each x in X gives the number of the bin that x is in.

Clearly this can be done in an "unsmart" way by looping
through all of X along with something like

  which( (B[1:10] < X[j]) & (X[j] <= B[2:11]) )

However, I feel that this naturally occurring task must
have received a smarter solution! The hist() function
already does this implicitly, since it has to decide
which bin a value in X should be counted in. But it
apparently then discards this information, since there
is nothing relevant in the return values from hist().

So is there a "smart" function somewhere for this?

The motivation here is that I have multivariate data,
(X,Y,Z,...) and I wish to study how it behaves in each
different bin for X. So the "bin index", ixB aY, derived
for X can be applied to select corresponding subsets of
the other variables. Rather than doing it the clumsy
way each time, e.g. according to

  Y[(B[i] < X) & (X <= B[j+1])]

I would like to have the bin index permanently available
-- for example it allows easy logical combinations of
bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)].

With thanks,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-11                                       Time: 09:00:27
------------------------------ XFMail ------------------------------
#
Probably you're looking for function findInterval().


I hope it helps.

Best,
Dimitris
On 8/31/2011 10:00 AM, Ted Harding wrote:

  
    
#
Thanks, Dimitris. That looks hopeful!
Ted.
On 31-Aug-11 08:06:23, Dimitris Rizopoulos wrote:
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-11                                       Time: 09:14:19
------------------------------ XFMail ------------------------------
#
On 08/31/2011 06:00 PM, Ted Harding wrote:
Hi Ted,
Are you looking for something like this?

x<-sample(1:10,20,TRUE)
x
  [1]  5 10 10  9  1  1  1  7  2  1  2  1  1  1  9  7  8  5  6  8
binx<-cut(x,breaks=0:10)
as.numeric(binx)
  [1]  5 10 10  9  1  1  1  7  2  1  2  1  1  1  9  7  8  5  6  8

As binx is a factor, coercing it to numeric should return the bin number 
for each value.

Jim
#
On 31-Aug-11 08:25:15, Jim Lemon wrote:
Thanks, Jim! That looks neat too. According to ?cut, findInterval
(as suggested by Dimitris) may be more efficient, but I'll have
to look more closely into all these possibilities.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-11                                       Time: 09:39:17
------------------------------ XFMail ------------------------------