Having trouble converting a dataframe of character vectors to factors
Pleaser re-read ?sapply and pay particular attention to the "simplify" argument. The following should help explain the issues:
z <- data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE) sapply(z,class)
a b "character" "character"
z1 <- sapply(z,as.factor) sapply(z1,class)
a b c d e f "character" "character" "character" "character" "character" "character"
z2 <- sapply(z,factor, simplify = FALSE) sapply(z2,class)
a b "factor" "factor"
z3 <- lapply(z,factor) sapply(z3,class)
a b "factor" "factor"
z3
$a [1] a b c Levels: a b c $b [1] d e f Levels: d e f ## Note that both z2 and z3 are lists, and would have to be converted back to data frames. -- Bert
On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2<-sapply(scs2,as.factor) also this didn't work: scs2<-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with
str(scs2)
chr [1:10, 1:10] "very important" "very important" "very important" "very important" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ...
class(scs2)
"matrix"
But when I do it one at a time it works:
scs2$Q1_1<-as.factor(scs2$Q1_1)
scs2$Q1_2<- as.factor(scs2$Q1_2)
What am I doing wrong? How do I accomplish this with sapply or similar function?
Data for reproducibility:
scs2<-structure(list(Q1_1 = c("very important", "very important", "very important",
"very important", "very important", "very important", "very important",
"somewhat important", "important", "very important"), Q1_2 = c("important",
"somewhat important", "very important", "important", "important",
"very important", "somewhat important", "somewhat important",
"very important", "very important"), Q1_3 = c("very important",
"important", "very important", "very important", "important",
"very important", "very important", "somewhat important", "not important",
"important"), Q1_4 = c("very important", "important", "very important",
"very important", "important", "important", "important", "very important",
"somewhat important", "important"), Q1_5 = c("very important",
"not important", "important", "very important", "not important",
"important", "somewhat important", "important", "somewhat important",
"not important"), Q1_6 = c("very important", "not important",
"important", "very important", "somewhat important", "very important",
"very important", "very important", "important", "important"),
Q1_7 = c("very important", "somewhat important", "important",
"somewhat important", "important", "important", "very important",
"very important", "somewhat important", "not important"),
Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much",
"Very Much", "Very Much", "Very Much", "Very Much", "Very Much",
"Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes",
"yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None",
"None", "Confirmed Field of Study", "Confirmed Field of Study",
"Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1",
"Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4"
), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,
172L, 110L), class = "data.frame")
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm