Greetings All! As is often the case on this list, the answer may well be under my nose but I can't see it! I am looking for a "smart" way to do the following. Say I have a vector of values, X. I set up bins" for X, say with breaks at B = c(b1,b2,...,b11) covering the range of X, i.e. bins numbered 1:10. The value x is in bin i if B[i] < x <= B[i+1] What I seek is a vector, of the same length as X, which for each x in X gives the number of the bin that x is in. Clearly this can be done in an "unsmart" way by looping through all of X along with something like which( (B[1:10] < X[j]) & (X[j] <= B[2:11]) ) However, I feel that this naturally occurring task must have received a smarter solution! The hist() function already does this implicitly, since it has to decide which bin a value in X should be counted in. But it apparently then discards this information, since there is nothing relevant in the return values from hist(). So is there a "smart" function somewhere for this? The motivation here is that I have multivariate data, (X,Y,Z,...) and I wish to study how it behaves in each different bin for X. So the "bin index", ixB aY, derived for X can be applied to select corresponding subsets of the other variables. Rather than doing it the clumsy way each time, e.g. according to Y[(B[i] < X) & (X <= B[j+1])] I would like to have the bin index permanently available -- for example it allows easy logical combinations of bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)]. With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-11 Time: 09:00:27 ------------------------------ XFMail ------------------------------
Classifying values by interval
5 messages · Dimitris Rizopoulos, (Ted Harding), Jim Lemon
Probably you're looking for function findInterval(). I hope it helps. Best, Dimitris
On 8/31/2011 10:00 AM, Ted Harding wrote:
Greetings All! As is often the case on this list, the answer may well be under my nose but I can't see it! I am looking for a "smart" way to do the following. Say I have a vector of values, X. I set up bins" for X, say with breaks at B = c(b1,b2,...,b11) covering the range of X, i.e. bins numbered 1:10. The value x is in bin i if B[i]< x<= B[i+1] What I seek is a vector, of the same length as X, which for each x in X gives the number of the bin that x is in. Clearly this can be done in an "unsmart" way by looping through all of X along with something like which( (B[1:10]< X[j])& (X[j]<= B[2:11]) ) However, I feel that this naturally occurring task must have received a smarter solution! The hist() function already does this implicitly, since it has to decide which bin a value in X should be counted in. But it apparently then discards this information, since there is nothing relevant in the return values from hist(). So is there a "smart" function somewhere for this? The motivation here is that I have multivariate data, (X,Y,Z,...) and I wish to study how it behaves in each different bin for X. So the "bin index", ixB aY, derived for X can be applied to select corresponding subsets of the other variables. Rather than doing it the clumsy way each time, e.g. according to Y[(B[i]< X)& (X<= B[j+1])] I would like to have the bin index permanently available -- for example it allows easy logical combinations of bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)]. With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding)<ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-11 Time: 09:00:27 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
Thanks, Dimitris. That looks hopeful! Ted.
On 31-Aug-11 08:06:23, Dimitris Rizopoulos wrote:
Probably you're looking for function findInterval(). I hope it helps. Best, Dimitris On 8/31/2011 10:00 AM, Ted Harding wrote:
Greetings All! As is often the case on this list, the answer may well be under my nose but I can't see it! I am looking for a "smart" way to do the following. Say I have a vector of values, X. I set up bins" for X, say with breaks at B = c(b1,b2,...,b11) covering the range of X, i.e. bins numbered 1:10. The value x is in bin i if B[i]< x<= B[i+1] What I seek is a vector, of the same length as X, which for each x in X gives the number of the bin that x is in. Clearly this can be done in an "unsmart" way by looping through all of X along with something like which( (B[1:10]< X[j])& (X[j]<= B[2:11]) ) However, I feel that this naturally occurring task must have received a smarter solution! The hist() function already does this implicitly, since it has to decide which bin a value in X should be counted in. But it apparently then discards this information, since there is nothing relevant in the return values from hist(). So is there a "smart" function somewhere for this? The motivation here is that I have multivariate data, (X,Y,Z,...) and I wish to study how it behaves in each different bin for X. So the "bin index", ixB aY, derived for X can be applied to select corresponding subsets of the other variables. Rather than doing it the clumsy way each time, e.g. according to Y[(B[i]< X)& (X<= B[j+1])] I would like to have the bin index permanently available -- for example it allows easy logical combinations of bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)]. With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding)<ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-11 Time: 09:00:27 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
-------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-11 Time: 09:14:19 ------------------------------ XFMail ------------------------------
On 08/31/2011 06:00 PM, Ted Harding wrote:
Greetings All! As is often the case on this list, the answer may well be under my nose but I can't see it! I am looking for a "smart" way to do the following. Say I have a vector of values, X. I set up bins" for X, say with breaks at B = c(b1,b2,...,b11) covering the range of X, i.e. bins numbered 1:10. The value x is in bin i if B[i]< x<= B[i+1] What I seek is a vector, of the same length as X, which for each x in X gives the number of the bin that x is in. Clearly this can be done in an "unsmart" way by looping through all of X along with something like which( (B[1:10]< X[j])& (X[j]<= B[2:11]) ) However, I feel that this naturally occurring task must have received a smarter solution! The hist() function already does this implicitly, since it has to decide which bin a value in X should be counted in. But it apparently then discards this information, since there is nothing relevant in the return values from hist(). So is there a "smart" function somewhere for this? The motivation here is that I have multivariate data, (X,Y,Z,...) and I wish to study how it behaves in each different bin for X. So the "bin index", ixB aY, derived for X can be applied to select corresponding subsets of the other variables. Rather than doing it the clumsy way each time, e.g. according to Y[(B[i]< X)& (X<= B[j+1])] I would like to have the bin index permanently available -- for example it allows easy logical combinations of bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)].
Hi Ted, Are you looking for something like this? x<-sample(1:10,20,TRUE) x [1] 5 10 10 9 1 1 1 7 2 1 2 1 1 1 9 7 8 5 6 8 binx<-cut(x,breaks=0:10) as.numeric(binx) [1] 5 10 10 9 1 1 1 7 2 1 2 1 1 1 9 7 8 5 6 8 As binx is a factor, coercing it to numeric should return the bin number for each value. Jim
On 31-Aug-11 08:25:15, Jim Lemon wrote:
On 08/31/2011 06:00 PM, Ted Harding wrote:
Greetings All! As is often the case on this list, the answer may well be under my nose but I can't see it! I am looking for a "smart" way to do the following. Say I have a vector of values, X. I set up bins" for X, say with breaks at B = c(b1,b2,...,b11) covering the range of X, i.e. bins numbered 1:10. The value x is in bin i if B[i]< x<= B[i+1] What I seek is a vector, of the same length as X, which for each x in X gives the number of the bin that x is in. Clearly this can be done in an "unsmart" way by looping through all of X along with something like which( (B[1:10]< X[j])& (X[j]<= B[2:11]) ) However, I feel that this naturally occurring task must have received a smarter solution! The hist() function already does this implicitly, since it has to decide which bin a value in X should be counted in. But it apparently then discards this information, since there is nothing relevant in the return values from hist(). So is there a "smart" function somewhere for this? The motivation here is that I have multivariate data, (X,Y,Z,...) and I wish to study how it behaves in each different bin for X. So the "bin index", ixB aY, derived for X can be applied to select corresponding subsets of the other variables. Rather than doing it the clumsy way each time, e.g. according to Y[(B[i]< X)& (X<= B[j+1])] I would like to have the bin index permanently available -- for example it allows easy logical combinations of bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)].
Hi Ted, Are you looking for something like this? x<-sample(1:10,20,TRUE) x [1] 5 10 10 9 1 1 1 7 2 1 2 1 1 1 9 7 8 5 6 8 binx<-cut(x,breaks=0:10) as.numeric(binx) [1] 5 10 10 9 1 1 1 7 2 1 2 1 1 1 9 7 8 5 6 8 As binx is a factor, coercing it to numeric should return the bin number for each value. Jim
Thanks, Jim! That looks neat too. According to ?cut, findInterval (as suggested by Dimitris) may be more efficient, but I'll have to look more closely into all these possibilities. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-11 Time: 09:39:17 ------------------------------ XFMail ------------------------------