Skip to content

Easily switchable factor levels

2 messages · Barry Rowlingson, Heinz Tuechler

#
I've recently been working with some California county-level data. The
counties can be referred to as either FIPS codes, eg F060102, friendly
names such as "Del Norte County", names without 'County' on the end,
names with 'CA' on the end ("Del Norte County, CA"). Different data
sets use slightly different forms and putting them all together is a
pain.

 So I was wondering about ways to attach multiple sets of level codes
to a factor. It would work something like this:

 > foo=multifactor(sample(letters,5),levels=letters,levelname="lower")
 > foo
 [1] m u i z b
 Levels: a b c d ... y z
 > levels(foo,"upper") = LETTERS
 > uselevels(foo,"upper")
 > foo
 [1] M U I Z B
  Levels: A B C D E F....Z
 > uselevels(foo,"lower")
 > foo
 [1] m u i z b
  Levels: a b c d ....z

In this way you could easily switch your levels from M and F to Male
and Female, or Hommes et Dames, without having to do levels(foo) =
something and hope to get the ordering right every time. Just do it
once, keep the multiple sets of level lables in the object.

I'd even throw in a function to print out all the level codes:

 > levels(foo,all=TRUE)
   upper  lower
[1] A  a
[2] B  b

etc

I can see assorted problems coding this up to cope with dropping
levels when making subsets... and possibly problems when code does
character matching of levels and expects them to be unchanged...

Has anyone bothered to write anything like this yet? Or is the
application a bit too rare to be worth it?

Barry
#
To me this is a common situation, especially to switch between two 
languages. I solve it by separating the coding of values and their 
labels. Values are coded numerically or as character, and their 
labels are attached by a value.label attribute. When needed a 
modified factor function transforms these variable into a factor 
using the value.labels as labels for the factor.
It's, however, no nice code and a drawback is that the value.label 
attribute has to be copied on subsetting.

best regards,

Heinz
At 23.02.2011 22:23 +0000, Barry Rowlingson wrote: