Skip to content

predict.lm if regression vector is longer than predicton vector

4 messages · frauke, S Ellison, William Dunlap +1 more

#
Hi everybody, 

recently a member of the community pointed me to the useful predict.lm()
comment. While I was toying with it, I stumbled across the following
problem. 
I do the regression with data from five years. But I want to do a prediction
with predict.lm for only one year. Thus my dataframe for predict.lm(mod,
newdata=dataframe) is shorter than the orginial vector that I did the
regression with. It gives you the following error:
Warning message:
'newdata' had 365 rows but variable(s) found have 1825 rows 
Of course I can extend the new dataframe with a few thousands NAs, but is
there a more elegant solution?

Thank you! Frauke



--
View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-vector-is-longer-than-predicton-vector-tp4644881.html
Sent from the R help mailing list archive at Nabble.com.
#
That should not be necessary: predict.lm should work on any number of newdata rows, whether longer or shorter than the original data set.

However, the help page for predict.lm says (among other things)

    "If the fit is rank-deficient, some of the columns of the design
     matrix will have been dropped.  Prediction from such a fit only
     makes sense if 'newdata' is contained in the same subspace as the
     original data.  That cannot be checked accurately, so a warning is
     issued."

Could that be the situation you are in? If it is, it's not the new data that causes the problem, but the original fit.

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
#
This can happen if your newdata data.frame does not include
all the predictors required by the formula in the model.  In that
case predict will look in the current evaluation environment to
find the missing predictors, and those will generally not match
what is in your newdata.   E.g.,
1            2            3            4            5            6 
-0.008176128  0.725397589  1.089747865  1.361792281  1.596914353  1.813575253
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  variable lengths differ (found for 'x2')
In addition: Warning message:
'newdata' had 5 rows but variable(s) found have 6 rows

Put all the required variables into newdata and things are fine
1          2          3          4          5 
-0.0366699 -1.1321492 -2.3778906 -3.6469522 -4.7909516

You can also get this problem if newdata is an environment or list
instead of a data.frame, because only data.frame forces all of
its components to have the same length.


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
The most common case that I see that error is when someone fits their
model using syntax like:

fit <- lm( mydata$y ~ mydata$x )

instead of the preferred method:

fit <- lm( y ~ x, data=mydata )

The fix (if this is what you did and why you are getting the error) is
to not use the first way and instead use the second, preferred way.
On Wed, Oct 3, 2012 at 8:37 AM, frauke <fhoss at andrew.cmu.edu> wrote: