Hi all,
I had some trouble in?regrouping factor levels for a variable. After some experiments, I have figured out how I can recode to modify the factor levels. I would now like some help to understand why some methods work and others don't.
Here's my code :
rm(list=ls())
###some trials in recoding factor levels
char<-letters[1:10]
fac<-factor(char)
levels(fac)
print(fac)
##first method of recoding factors
fac1<-fac
levels(fac1)[c("a","b","c")]<-"A"
levels(fac1)[c("d","e","f")]<-"B"
levels(fac1)[c("g","h","i","j")]<-"C"
levels(fac1)
print(fac1)
##second method
fac2<-fac?
levels(fac2)[c(1,2,3)]<-"A"
levels(fac2)[c(2,3,4)]<-"B" # not c(4,5,6)
levels(fac2)[c(3,4,5,6)]<-"C" # not c(7,8,9,10)
levels(fac2)
print(fac2)
#third method
fac3<-fac
levels(fac3)<-list("A"=c("a","b","c"),"B"=c("d","e","f"),"C"=c("g","h","i","j"))
levels(fac3)
print(fac3)
I first tried method 1 and had no luck with it at all. The levels A, B, and C just got added to the existing levels without affecting the fac variable.
After some time, I was able to figure out how I should use method 2.
After reading the help documentation, I?arrived at method 3.
I would appreciate help in understanding why?the first method does not work. In my application, I had long factor names and Tinn-R just would not accept?statements running to several lines. Partial substitution?was desirable then.?Having spent a considerable?amount of time on this, I would like to understand the underlying problem with method 1 as it is. The deeper understanding could be useful for me later.
Thanking You,
Ravi?
regrouping factor levels
2 messages · ravi, PIKAL Petr
2 days later
Hi r-help-bounces at r-project.org napsal dne 22.05.2009 18:53:37:
Hi all, I had some trouble in regrouping factor levels for a variable. After
some
experiments, I have figured out how I can recode to modify the factor
levels.
I would now like some help to understand why some methods work and
others don't.
Here's my code :
rm(list=ls())
###some trials in recoding factor levels
char<-letters[1:10]
fac<-factor(char)
levels(fac)
print(fac)
##first method of recoding factors
fac1<-fac
levels(fac1)[c("a","b","c")]<-"A"
levels(fac1)[c("d","e","f")]<-"B"
levels(fac1)[c("g","h","i","j")]<-"C"
levels(fac1)
print(fac1)
##second method
fac2<-fac
levels(fac2)[c(1,2,3)]<-"A"
levels(fac2)[c(2,3,4)]<-"B" # not c(4,5,6)
levels(fac2)[c(3,4,5,6)]<-"C" # not c(7,8,9,10)
levels(fac2)
print(fac2)
#third method
fac3<-fac
levels(fac3)<-list("A"=c("a","b","c"),"B"=c("d","e","f"),"C"=c("g","h","i","j"))
levels(fac3) print(fac3) I first tried method 1 and had no luck with it at all. The levels A, B,
and C
just got added to the existing levels without affecting the fac
variable.
After some time, I was able to figure out how I should use method 2. After reading the help documentation, I arrived at method 3. I would appreciate help in understanding why the first method does not
work.
See the difference in those 2 selection methods
levels(fac1)[c("a","b","c")]
and
levels(fac2)[c(1,2,3)]
a, b, c are vector items not names so you need to check their presence by
%in% operator
modified method1
which.levels <- levels(fac1) %in% c("a","b","c")
levels(fac1)[which.levels] <- "A"
Method4
fac4<-fac
levels(fac4)<-c(rep("A",3), rep("B", 3), rep("C",4))
You can either to replace all levels at once or to select levels and
replace them with correct number of items.
Regards
Petr
In my application, I had long factor names and Tinn-R just would not accept statements running to several lines. Partial substitution was
desirable
then. Having spent a considerable amount of time on this, I would like
to
understand the underlying problem with method 1 as it is. The deeper understanding could be useful for me later. Thanking You, Ravi
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.