Skip to content
Prev 2895 / 12125 Next

[R-pkg-devel] Determine subset from glm object

On 08/07/2018 11:48 AM, Charles Geyer wrote:
You don't want the "as.integer".  If the dataframe had rownames to start 
with, the x component of the fit will have row labels consisting of 
those labels, so as.integer may fail.  Even if it doesn't, the rownames 
aren't necessarily sequential integers.   You can index the dataframe by 
the character versions of the default numbers, so simply
rownames(gout$x) should always work.

More generally, I'm not sure your question is well posed.  What do you 
mean by "the subsetting"?  If you have something like

df <- data.frame(letters, x = 1:26, y = rbinom(26, 1, 0.5))

df1 <- subset(df, letters > "b" & letters < "y")

gout <- glm(y ~ x, data = df1, subset = letters < "q", x = TRUE)

the rownames(gout$x) are going to be numbers for rows of df, because df1 
will get a subset of those as row labels.
You should be able to evaluate the subset expression in the environment 
of the formula, i.e.

eval(gout$call$subset, envir = environment(gout$formula))

This may give incorrect results if the variables used in subsetting 
aren't in the dataframe and have changed since glm() was called.
I would trust evaluating the subset more than grabbing row labels from 
gout$x, but I don't know for sure it is likely to be more robust.

Duncan Murdoch