On Jan 6, 2017, at 11:03 AM, Jacob Wegelin <jacobwegelin at fastmail.fm> wrote:
Given any regression model, created for instance by lm, lme, lmer, or rqs, such as
z1<-lm(weight~poly(Time,2), data=ChickWeight)
I would like a general way to obtain only those variables used for the model. In the current example, this "minimal data frame" would consist of the "weight" and "Time" variables and none of the other columns of ChickWeight.
(Motivation: Sometimes the data frame contains thousands of variables which are not used in the current regression, and I do not want to keep copying and propagating them.)
The "model" component of the regression object doesn't serve this purpose:
weight poly(Time, 2).1 poly(Time, 2).2
1 42 -0.066020938 0.072002235
2 51 -0.053701293 0.031099018
3 59 -0.041381647 -0.001334588
4 64 -0.029062001 -0.025298582
5 76 -0.016742356 -0.040792965
6 93 -0.004422710 -0.047817737
The following awkward workaround seems to do it when variable names contain only "word characters" as defined by regex:
minimalvariablesfrommodel20161120 <-function(object, originaldata){
# stopifnot(!missing(originaldata))
stopifnot(!missing(object))
intersect(
unique(unlist(strsplit(format(object$call$formula), split="\\W", perl=TRUE)))
, names(originaldata)
)
}
minimalvariablesfrommodel20161120(z1, ChickWeight)
But if a variable has a space in its name, my workaround fails:
ChickWeight$"dog tail"<-ChickWeight$Time
z1<-lm(weight~poly(`dog tail`,2), data=ChickWeight)
head(z1$model)
weight poly(`dog tail`, 2).1 poly(`dog tail`, 2).2
1 42 -0.066020938 0.072002235
2 51 -0.053701293 0.031099018
3 59 -0.041381647 -0.001334588
4 64 -0.029062001 -0.025298582
5 76 -0.016742356 -0.040792965
6 93 -0.004422710 -0.047817737
minimalvariablesfrommodel20161120(z1, ChickWeight)
Is there a more elegant, and hence more reliable, approach?
Thanks
Jacob A. Wegelin