Making a factor with common levels ...
The combine.levels function in the Hmisc library is
related to this:
combine.levels <- function(x, minlev=.05) {
x <- as.factor(x)
lev <- levels(x)
f <- table(x)/sum(!is.na(x))
i <- f < minlev
si <- sum(i)
if(si==0) return(x)
levels(x) <- if(si==1) list(names(sort(f))[1:2]) else
list(OTHER=names(f)[i])
x
}
This combines levels that have a relative frequency
below 'minlev' into new categories. -Frank Harrell
j.logsdon at lancaster.ac.uk wrote:
This is doing my head in. Staying away from R for too long is bad for the health! I have two vectors of character names where there may be repetition and from which I want to form two factors with the same levels but only if there are more than N instances of each name in each vector. I can get the list of common names quite easily, using: nn<-sort(unique(c(levels(n1)[table(n1)>N],levels(n0)[table(n0)>N]))) Some of the factor levels may be empty for one of the factors but the same level must be present in the other. Is there a simple way to extract nn0 and nn1 so that the pairs remain correctly aligned and each list has at least N cases of each name? Or do I have to jump into my steamroller and do a couple of loops? TIA John -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._