Message-ID: <Pine.LNX.4.44.0303260711550.6251-100000@gannet.stats>
Date: 2003-03-26T08:20:00Z
From: Brian Ripley
Subject: predict (PR#2686)
In-Reply-To: <200303252331.h2PNVHn9024938@pubhealth.ku.dk>
This is intentional. The coding for factors is based on the full set of
levels, and should be comparable for different prediction sets.
If you are using factors with fictitious levels the fix is obvious:
improve the design.
On Wed, 26 Mar 2003 Mark.Bravington@csiro.au wrote:
> # r-bugs@r-project.org
>
> `predict' complains about new factor levels, even if the "new" levels are
> merely levels in the original that didn't occur in the original fit and were
> sensibly dropped, and that don't occur in the prediction data either. (At
> least if `drop.unused.levels' was set to TRUE, which the default.)
Actually, the default is FALSE: see args(model.frame.default). lm and glm
call model.frame.default with non-default args.
> test> scrunge.data.2_ data.frame( y=runif( 3), disc=factor( c( 'cat', 'dog',
> 'cat'), levels=c( 'cat', 'dog', 'earwig')))
> test> lm.predbug.2_ lm( y~disc, data=scrunge.data.2)
> test> predict(lm.predbug.2, newdata=scrunge.data.2)
> Error in model.frame.default(object, data, xlev = xlev) :
> factor disc has new level(s) earwig
>
>
> A cure for this seems to be to add the commented line below towards the end
> of `model.frame.default':
>
> <<...>>
> if (length(xlev) > 0) {
> for (nm in names(xlev)) if (!is.null(xl <- xlev[[nm]])) {
> xi <- data[[nm]]
> if (is.null(nxl <- levels(xi)))
> warning(paste("variable", nm, "is not a factor"))
> else {
> xi <- xi[, drop = TRUE]
> nxl <- levels( xi) # MVB: remove droppees
> if (any(m <- is.na(match(nxl, xl))))
> stop(paste("factor", nm, "has new level(s)", nxl[m]))
> }
> }
> }
> else if (drop.unused.levels) {
> <<...>>
>
> cheers
> Mark
>
> *******************************
>
> Mark Bravington
> CSIRO (CMIS)
> PO Box 1538
> Castray Esplanade
> Hobart
> TAS 7001
>
> phone (61) 3 6232 5118
> fax (61) 3 6232 5012
> Mark.Bravington@csiro.au
>
> --please do not edit the information below--
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status =
> major = 1
> minor = 6.2
> year = 2003
> month = 01
> day = 10
> language = R
>
> Windows 2000 Professional (build 2195) Service Pack 3.0
>
> Search Path:
> .GlobalEnv, ROOT, package:handy, package:debug, mvb.session.info,
> package:mvbutils, package:tcltk, Autoloads, package:base
>
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595