Collapse factor levels
On Nov 1, 2009, at 3:51 PM, Kevin E. Thorpe wrote:
I'm sure this is simple enough, but an R site search on my subject terms did suggest a solution. I have a numeric vector with many values that I wish to create a factor from having only a few levels. Here is a toy example.
x <- 1:10 x <-
factor
(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
You have thusly created a pathological situation. In 2.10.0 this is
what you might see:
> x <-
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C", :
duplicated levels will not be allowed in factors anymore
What you _should_ have done was:
x2 <- factor(c("A","A","A","B","B","B","C","C","C","C"))
The usual approach to getting rid of unused factor levels is just to
apply the function factor() again without additional arguments.
> x <- factor(x) # the "x" was from your code
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C", :
duplicated levels will not be allowed in factors anymore
# but that will be the last time you will see the warning..
> summary(x)
A B C
3 3 4
David. > > x > [1] A A A B B B C C C C > Levels: A A A B B B C C C C > > summary(x) > A A A B B B C C C C > 3 0 0 3 0 0 4 0 0 0 > > So, there are clearly still 10 underlying levels. The results I would > like to see from printing the value and summary(x) are: > > > x > [1] A A A B B B C C C C > Levels: A B C > > summary(x) > A B C > 3 3 4 > > Hopefully this makes sense. > > Thanks, > > Kevin > > -- > Kevin E. Thorpe > Biostatistician/Trialist, Knowledge Translation Program > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT