What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N <- 20 x <- runif(N,0,1) # setup ranges to define categories x.a <- (x >= 0.0) & (x < 0.4) x.b <- (x >= 0.4) & (x < 0.5) x.c <- (x >= 0.5) & (x < 0.6) x.d <- (x >= 0.6) & (x < 1.0) # create factors i <- runif(N,1,1) x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor <- factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the "hist" function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * "Introductory Statistics with R" * "A Brief Guide to R for Beginners in Econometrics" * "Econometrics in R"
Creating factors from continuous variables
4 messages · David James, Bert Gunter, Brian Ripley +1 more
?cut -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David James Sent: Friday, August 26, 2005 2:00 PM To: r-help at stat.math.ethz.ch Subject: [R] Creating factors from continuous variables What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N <- 20 x <- runif(N,0,1) # setup ranges to define categories x.a <- (x >= 0.0) & (x < 0.4) x.b <- (x >= 0.4) & (x < 0.5) x.c <- (x >= 0.5) & (x < 0.6) x.d <- (x >= 0.6) & (x < 1.0) # create factors i <- runif(N,1,1) x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor <- factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the "hist" function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * "Introductory Statistics with R" * "A Brief Guide to R for Beginners in Econometrics" * "Econometrics in R"
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
?cut This is in `An Introduction to R', the manual which ships with R and basic reading.
On Fri, 26 Aug 2005, David James wrote:
What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N <- 20 x <- runif(N,0,1) # setup ranges to define categories x.a <- (x >= 0.0) & (x < 0.4) x.b <- (x >= 0.4) & (x < 0.5) x.c <- (x >= 0.5) & (x < 0.6) x.d <- (x >= 0.6) & (x < 1.0) # create factors i <- runif(N,1,1) x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor <- factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the "hist" function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * "Introductory Statistics with R" * "A Brief Guide to R for Beginners in Econometrics" * "Econometrics in R"
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
?cut This is in `An Introduction to R', the manual which ships with R and basic reading.
...as well as in the Compendium section and in an exercise in Ch.1 in at least one of the cited references.
P.S. Here's what I have read to try to find the answer to my problem: * "Introductory Statistics with R" * "A Brief Guide to R for Beginners in Econometrics" * "Econometrics in R"
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907