How does predict.lm work?
on 09/09/2008 09:59 AM Williams, Robin wrote:
Hi, Please could someone explain how this element of predict.lm works?
From the help file
` newdata An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. ' Does this dataframe (newdata) need to have the same variable names as was used in the original data frame used to fit the model?
Yes. Also, see the Note in ?predict.lm: "Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied." It also says "Variables", not columns.
Or will R just look across consecutive columns of newdata, and apply them to the call as appropriate?
No.
For example, if I have fitted a model with four variables (x1,x2,x3,x4) in my original dataframe, and then have a second dataframe which I want to supply to the newdata argument in predict.lm with variable names (x5, x6, x7, x8), do I need to change the variable names in my newdata dataframe to match those of the original dataframe?
Yes.
Or will R treat x5 as x1, x6 as x2, etc, when using predict.lm? I would like to know so that I can design the structure of some somewhat larger dataframes in a manner which will make using predict.lm straight forward and quick. Hope this makes sense. Many thanks for any help.
HTH, Marc Schwartz