Skip to content

Data type in a data frame

3 messages · asafwe, Rui Barradas, William Dunlap

#
Hi all,

How does R know to regard a variable as a factor and not a character?
For example, consider the following table:

Observation                Gender                Dosage               
Alertness 
1                               m                        a                        
8 
2                               m                        a                       
12 
3                               m                        a                       
13 
4                               m                        a                       
12 
5                               m                        b                        
6 
6                               m                        b                        
7 

When read into a dataframe, will "m", "a", "b" be regarded as a factor or as
a character? How does R decide?

Thanks a lot in advance,

Asaf



--
View this message in context: http://r.789695.n4.nabble.com/Data-type-in-a-data-frame-tp4647161.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

When read into a data.frame, R defaults to reading character strings as 
factors. If you don't want that, use option stringsAsFactors = FALSE. 
Using your dataset,


dat1 <- read.table(text = "
Observation   Gender  Dosage  Alertness
1             m       a               8
2             m       a              12
3             m       a              13
4             m       a              12
5             m       b               6
6             m       b               7
", header = TRUE)
str(dat2)

dat2 <- read.table(text = "
Observation   Gender  Dosage  Alertness
1             m       a               8
2             m       a              12
3             m       a              13
4             m       a              12
5             m       b               6
6             m       b               7
", header = TRUE, stringsAsFactors = FALSE)
str(dat2)


This is decided based on the setting of (which you can change)

options("stringsAsFactors")

Hope this helps,

Rui Barradas
Em 23-10-2012 15:43, asafwe escreveu:
#
This is somewhat tangential, but if you plan on using
  predict(fit,newdata=nd)
after fitting a model like
  fit <- lm(y~x, data=d)
be sure you have converted character columns in nd and d into factors.
Otherwise you are likely to get errors from predict().   You will get
a warning when fitting the model if you use character columns, but
the results are ok until you use predict() on the result.

E.g.,
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'cGroup' converted to a factor
cGroupA cGroupB cGroupC 
    2.0     5.5     8.5
fGroupA fGroupB fGroupC 
    2.0     5.5     8.5
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
  variable 'cGroup' converted to a factor
10 
8.5
Error in predict.lm(fitChar, newdata = d[c(1, 10), ]) : 
  subscript out of bounds
In addition: Warning message:
In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
  variable 'cGroup' converted to a factor
1  10 
2.0 8.5


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com