GLM with Numeric and Factor as an Input
On 26/02/14 01:40, Lorenzo Isella wrote:
Dear All, Please consider the snippet at the end of the email. It is representative of the problems I am experiencing. I am trying to use glm (without using the formula interface because the original data is quite large) to model the response in a case where the predictors are a mix of numbers and factors. In the end, I always end up with an error message, despite having tried different choices for the "family" parameter. Maybe I am missing the obvious, but can anyone run glm with a combination of numbers and factors? Any help is appreciated. Cheers Lorenzo ############################################################### set.seed(1234) x <- rnorm(1000) dim(x) <- c(100,10) x <- as.data.frame(x) names(x) <- LETTERS[seq(10)] x$J <- round(x$J) x$J <- as.factor(x$J) y <- x$A x <- subset(x, select=-c(A)) model <- glm.fit(x,y## , family=gaussian)
From the help for glm.fit:
For glm.fit: x is a ***design*** matrix of dimension n * p, and y is a vector of observations of length n.
(Emphasis mine.) So if you want to/insist on using glm.fit() rather than glm() you will have construct your own design matrix. I.e. replace each factor column by k-1 columns of dummy variables (where k is the number of levels of the given factor). Note that "x" should really be a *matrix*, not a data frame although it seems that data frames (all of whose columns are numeric) get coerced to matrices so it doesn't matter much. cheers, Rolf Turner