Skip to content

Collapse factor levels

5 messages · Kevin E. Thorpe, Jorge Ivan Velez, David Winsemius +1 more

#
I'm sure this is simple enough, but an R site search on my subject
terms did suggest a solution.  I have a numeric vector with many
values that I wish to create a factor from having only a few levels.
Here is a toy example.

 > x <- 1:10
 > x <- 
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
 > x
  [1] A A A B B B C C C C
Levels: A A A B B B C C C C
 > summary(x)
A A A B B B C C C C
3 0 0 3 0 0 4 0 0 0

So, there are clearly still 10 underlying levels.  The results I would
like to see from printing the value and summary(x) are:

 > x
  [1] A A A B B B C C C C
Levels: A B C
 > summary(x)
A B C
3 3 4

Hopefully this makes sense.

Thanks,

Kevin
#
On Nov 1, 2009, at 3:51 PM, Kevin E. Thorpe wrote:

            
You have thusly created a pathological situation. In 2.10.0 this is  
what you might see:

 >  x <-  
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

What you _should_ have done was:

  x2 <- factor(c("A","A","A","B","B","B","C","C","C","C"))

The usual approach to getting rid of unused factor levels is just to  
apply the function factor() again without additional arguments.

 > x <- factor(x)  # the "x" was from your code
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

# but that will be the last time you will see the warning..

 > summary(x)
A B C
3 3 4
#
Kevin E. Thorpe wrote:
It's an anomaly inherited frokm S-PLUS (or so I have been told). 
Actually, with the current R, you should get a warning:

 > x <- 1:10
 > x <- 
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

This works (as documented on the help page for levels!):

 > x <- 1:10
 > x <- factor(x,levels=1:10)
 > levels(x) <- c("A","A","A","B","B","B","C","C","C","C")
 > table(x)
x
A B C
3 3 4
#
Peter Dalgaard wrote:
Thanks.  That's exactly what I need.  I knew it was simple.
I've even used levels() before, but it just didn't occur to
me this time.  I'm clearly not on current R. :-)
When I have some time, I'll upgrade.

Kevin