Coercing character to factor
On Wed, 8 Mar 2000, Marc Feldesman wrote:
I just downloaded version 1.0.0 and several binary libraries (VR, rpart,
norm, stataread) - WinNT version. I then converted a file from Stata 6.0
to R format by using the stataread library. The file converts perfectly
and I was able to use the VR function lda on the dataframe without
difficulty. I then tried to use the same dataframe with RPART. The model
statement:
test.rp<-rpart(genus~x+y+z+a+b+c, data=mydata) fails with the following error:
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames, :
invalid variable type
(the identical model statement works perfectly in lda)
I've traced the error to how RPART (or R) deals with the dependent variable
"genus", which is converted from a Stata file to an R file as a "character"
variable.
Yes. stataread reads Stata string variables as character. I think this is the right thing to do, since Stata recommends that you use numeric variables with labels if you really just have factors and generally doesn't encourage the use of strings. model.frame doesn't allow character variables. It would be possible for model.frame to coerce characters to factors, but it currently doesn't. We would certainly recommend that you explicitly coerce strings to factors rather than having it happen automatically, but it may be that model.frame should handle strings as a fallback position (perhaps with a warning). rpart is completely blameless :) -thomas Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle
The model statement works fine if I do: test.rp<-rpart(as.factor(genus)~x+y+z+a+b+c, data=mydata) or mydata[,2]<-as.factor(mydata[,2]) test.rp<-rpart(genus~x+y+z+a+b+c, data=mydata) Is this an R, RPART, or stataread issue? Where did I think I read that R coerced character variables to factors if the context called for factor variables? ===================== Dr. Marc R. Feldesman Professor and Chairman Anthropology Department Portland State University 1721 SW Broadway Portland, Oregon 97201 email: feldesmanm at pdx.edu phone: 503-725-3081 fax: 503-725-3905 http://odin.cc.pdx.edu/~h1mf ====================== -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._