lda in R vs S
On Thu, 6 May 1999, Marc R. Feldesman wrote:
At 09:24 PM 5/6/1999 +0100, Prof Brian D Ripley wrote:
I'm running a discriminant analysis in R (0.64.1) to compare it with SPlus
That's not released until tomorrow! I guess you have the pre-release, prerw0641, which is actually of 0.64.0.
Yes. Actually the pre-release of 0.64.1
4.5R2. The following command line works fine in SPlus but gives an error in R. I've only used R for a little while so I'm not certain here what R (or lda) is complaining about. The dependent variable (sarich.na[,3]) is an alpha categorical variable, if that makes a difference. I'm using
What's that? The response ought to be a factor, according to the docs:
SAS & SPSS speak. Alpha categorical variable = factor.
formula: A formula of the form `groups ~ x1 + x2 + ...{}'
That is, the response is the grouping factor and
the right hand side specifies the (non-factor)
discriminators.
version VR5.3 (file name VR5.3pl037.zip). lda.out<-lda(sarich.na[,3]~., data=sarich.na[,4:32]) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type Is this an lda issue or an R issue?
It is an R issue. Only logical, integer and real variables are allowed in R model frames, for as the code says
I haven't delved deeply into R internals yet. I just started experimenting with it as I was learning SPlus in parallel. So at the present time, even though sarich.na[,3] *is* a factor but with alpha levels, are you saying that R won't allow this?
It will allow factors: they get coerced to integers. I think from the evidence later that sarich.na[,3] is not a factor, even if it looks like one.
/* Sanity checks to ensure that the the answer can become */ /* a data frame. Be deeply suspicious here! */
Deeply suspicious of what?
Of things that look like factors? (I don't know, I didn't write this.)
But that is not the `right' way to do this in either. Use either
Either? Are you saying that the formulation above isn't correct in *either* R or SPlus? It works fine in SPlus (and sarich.na[,3] is coded as a factor with levels "AINU", "BUSHMAN", etc...). But, SPlus also allows sarich.na[,3] to be on the left side even if it isn't an explicit factor.
I am saying that it is legal in S-PLUS but poor style, and likely to cause methods (e.g. for prediction) to fail. In neither dialect is it what the designers intended.
Even if it is coded only as a character variable, SPlus allows it, lda calculates the results, and gives the correct answers. Presumably if this isn't the "correct" approach, SPlus or lda is coercing the character variable to a factor. This also works in aov and other functions that take a formula.
Yes, S-PLUS coerces character vars in model frames to factor, and I believe R does not allow them. Here is a simple experiment. R:
data(iris) names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" species <- as.character(iris$Species)
lda(species ~ . - Species, data=iris)
Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type
lda(Species ~ ., data=iris) lda(as.matrix(iris[, 1:4]), species)
works fine. It looks like R is having problems with data frames here that I will have to look into. In R a data frame is not a matrix, and much less coercion gets done. Brian
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._