On Thu, 27 Mar 2003 Mark.Bravington@csiro.au wrote:
<Bravington wrote:> #> `predict' complains about new factor levels, even if the #"new" levels are #> merely levels in the original that didn't occur in the #original fit and were #> sensibly dropped, and that don't occur in the prediction #data either. <Ripley replied:> #This is intentional. The coding for factors is based on the #full set of #levels, and should be comparable for different prediction sets. # #If you are using factors with fictitious levels the fix is obvious: #improve the design. There is still an inconsistency bug between `lm' and `predict.lm', though. `lm' intentionally overlooks inactive levels of a factor, but `predict.lm'
Only if an argument is set, and originally lm did not do so.
doesn't, even when it legitimately could. In particular, it is a bit odd to have no problem predicting without a `newdata' argument even when the original data had inactive factor levels, but then to get an error if `newdata=<<original data>>' is supplied explicitly! (See example.)
Read again. predict.lm is consistent across its inputs: unlike lm it can take variable `newdata'. As I said the intention is to be consistent across *prediction sets*. Omitting newdata is not giving a prediction set.
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595