Hi everybody, recently a member of the community pointed me to the useful predict.lm() comment. While I was toying with it, I stumbled across the following problem. I do the regression with data from five years. But I want to do a prediction with predict.lm for only one year. Thus my dataframe for predict.lm(mod, newdata=dataframe) is shorter than the orginial vector that I did the regression with. It gives you the following error: Warning message: 'newdata' had 365 rows but variable(s) found have 1825 rows Of course I can extend the new dataframe with a few thousands NAs, but is there a more elegant solution? Thank you! Frauke -- View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-vector-is-longer-than-predicton-vector-tp4644881.html Sent from the R help mailing list archive at Nabble.com.
predict.lm if regression vector is longer than predicton vector
4 messages · frauke, S Ellison, William Dunlap +1 more
Of course I can extend the new dataframe with a few thousands NAs, but is there a more elegant solution?
That should not be necessary: predict.lm should work on any number of newdata rows, whether longer or shorter than the original data set.
However, the help page for predict.lm says (among other things)
"If the fit is rank-deficient, some of the columns of the design
matrix will have been dropped. Prediction from such a fit only
makes sense if 'newdata' is contained in the same subspace as the
original data. That cannot be checked accurately, so a warning is
issued."
Could that be the situation you are in? If it is, it's not the new data that causes the problem, but the original fit.
S Ellison
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
This can happen if your newdata data.frame does not include all the predictors required by the formula in the model. In that case predict will look in the current evaluation environment to find the missing predictors, and those will generally not match what is in your newdata. E.g.,
x1 <- 1:6 x2 <- 1/(1:6) y <- log(1:6) fit <- lm(y ~ x1 + x2) predict(fit)
1 2 3 4 5 6 -0.008176128 0.725397589 1.089747865 1.361792281 1.596914353 1.813575253
predict(fit, newdata=data.frame(x2=1:5)) # didn't supply x1
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'x2') In addition: Warning message: 'newdata' had 5 rows but variable(s) found have 6 rows Put all the required variables into newdata and things are fine
predict(fit, newdata=data.frame(x2=1:5, x1=sin(1:5)))
1 2 3 4 5 -0.0366699 -1.1321492 -2.3778906 -3.6469522 -4.7909516 You can also get this problem if newdata is an environment or list instead of a data.frame, because only data.frame forces all of its components to have the same length. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of frauke Sent: Wednesday, October 03, 2012 7:37 AM To: r-help at r-project.org Subject: [R] predict.lm if regression vector is longer than predicton vector Hi everybody, recently a member of the community pointed me to the useful predict.lm() comment. While I was toying with it, I stumbled across the following problem. I do the regression with data from five years. But I want to do a prediction with predict.lm for only one year. Thus my dataframe for predict.lm(mod, newdata=dataframe) is shorter than the orginial vector that I did the regression with. It gives you the following error: Warning message: 'newdata' had 365 rows but variable(s) found have 1825 rows Of course I can extend the new dataframe with a few thousands NAs, but is there a more elegant solution? Thank you! Frauke -- View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression- vector-is-longer-than-predicton-vector-tp4644881.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The most common case that I see that error is when someone fits their model using syntax like: fit <- lm( mydata$y ~ mydata$x ) instead of the preferred method: fit <- lm( y ~ x, data=mydata ) The fix (if this is what you did and why you are getting the error) is to not use the first way and instead use the second, preferred way.
On Wed, Oct 3, 2012 at 8:37 AM, frauke <fhoss at andrew.cmu.edu> wrote:
Hi everybody, recently a member of the community pointed me to the useful predict.lm() comment. While I was toying with it, I stumbled across the following problem. I do the regression with data from five years. But I want to do a prediction with predict.lm for only one year. Thus my dataframe for predict.lm(mod, newdata=dataframe) is shorter than the orginial vector that I did the regression with. It gives you the following error: Warning message: 'newdata' had 365 rows but variable(s) found have 1825 rows Of course I can extend the new dataframe with a few thousands NAs, but is there a more elegant solution? Thank you! Frauke -- View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-vector-is-longer-than-predicton-vector-tp4644881.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com