Data type in a data frame
When read into a data.frame, R defaults to reading character strings as factors. If you don't want that, use option stringsAsFactors = FALSE.
This is somewhat tangential, but if you plan on using predict(fit,newdata=nd) after fitting a model like fit <- lm(y~x, data=d) be sure you have converted character columns in nd and d into factors. Otherwise you are likely to get errors from predict(). You will get a warning when fitting the model if you use character columns, but the results are ok until you use predict() on the result. E.g.,
d <- data.frame(y=1:10, cGroup=rep(c("A","B","C"),c(3,4,3)), fGroup=factor(rep(c("A","B","C"),c(3,4,3))), stringsAsFactors=FALSE)
fitChar <- lm(y ~ cGroup - 1, data=d[1:9,])
Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'cGroup' converted to a factor
fitFactor <- lm(y ~ fGroup - 1, data=d[1:9,]) coef(fitChar)
cGroupA cGroupB cGroupC
2.0 5.5 8.5
coef(fitFactor)
fGroupA fGroupB fGroupC
2.0 5.5 8.5
# so far things are ok, but ... predict(fitChar, newdata=d[10,])
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'cGroup' converted to a factor
predict(fitFactor, newdata=d[10,])
10 8.5
predict(fitChar, newdata=d[c(1,10),])
Error in predict.lm(fitChar, newdata = d[c(1, 10), ]) : subscript out of bounds In addition: Warning message: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'cGroup' converted to a factor
predict(fitFactor, newdata=d[c(1,10),])
1 10 2.0 8.5 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Rui Barradas
Sent: Tuesday, October 23, 2012 11:16 AM
To: asafwe
Cc: r-help at r-project.org
Subject: Re: [R] Data type in a data frame
Hello,
When read into a data.frame, R defaults to reading character strings as
factors. If you don't want that, use option stringsAsFactors = FALSE.
Using your dataset,
dat1 <- read.table(text = "
Observation Gender Dosage Alertness
1 m a 8
2 m a 12
3 m a 13
4 m a 12
5 m b 6
6 m b 7
", header = TRUE)
str(dat2)
dat2 <- read.table(text = "
Observation Gender Dosage Alertness
1 m a 8
2 m a 12
3 m a 13
4 m a 12
5 m b 6
6 m b 7
", header = TRUE, stringsAsFactors = FALSE)
str(dat2)
This is decided based on the setting of (which you can change)
options("stringsAsFactors")
Hope this helps,
Rui Barradas
Em 23-10-2012 15:43, asafwe escreveu:
Hi all, How does R know to regard a variable as a factor and not a character? For example, consider the following table: Observation Gender Dosage Alertness 1 m a 8 2 m a 12 3 m a 13 4 m a 12 5 m b 6 6 m b 7 When read into a dataframe, will "m", "a", "b" be regarded as a factor or as a character? How does R decide? Thanks a lot in advance, Asaf -- View this message in context: http://r.789695.n4.nabble.com/Data-type-in-a-data-
frame-tp4647161.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.