-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Liaw, Andy
Sent: Tuesday, February 15, 2005 8:31 AM
To: 'Prof Brian Ripley'
Cc: r-help@stat.math.ethz.ch; 'Markus Jantti'
Subject: RE: [R] using poly in a linear regression in the
presence of NAf ails (despite subsetting them out)
My apologies: It's another case of me not thinking
statistically... It may also help those of us whose brains
run at slow clock speeds to have ?poly, ?bs and ?ns mention
how they react to NAs.
Best,
Andy
From: Prof Brian Ripley
Andy,
I don't think it is a bug. The problem is that poly(x, 2)
the possible set of x values, and so needs to know all of
e.g. log(x) which is observation-by-observation. Silently omitting
missing values is not a good idea in such cases, especially if the
values are missing in other variables (which is what na.action is
likely to do).
I would say models with poly, ns, bs etc are inadvisable in the
presence of missing values in their argument. We could make poly()
give an informative message, though.
Brian
On Mon, 14 Feb 2005, Liaw, Andy wrote:
This smells like a bug to me. The error is triggered by the line:
variables <- eval(predvars, data, env)
inside model.frame.default(). At that point, na.action
applied, so poly() ended being called on data that still
values. The qr() that issued the error is for generating
basis when evaluating poly(), not for fitting the linear
Essentially, calling
model.frame(y ~ poly(x, 2), data=data.frame(x=c(NA, 1:3),
na.action=na.omit)
would show the same error. The obvious workaround is to
before calling lm().
Andy
From: Markus J?ntti
I ran into a to me surprising result on running lm with an
polynomial among the predictors.
The lm command resulted in
Error in qr(X) : NA/NaN/Inf in foreign function call
during wrapup:
despite my using a "subset" in the call to get rid of NA's.
poly is apparently evaluated before any NA's are
the data.
Example code (attached to this e-mail as as a script):
## generate some data
n <- 50
x <- runif(n)
a0 <- 10
a1 <- .5
sigma.e <- 1
y <- a0 + a1*x + rnorm(n)*sigma.e tmp.d <- data.frame(y, x)
rm(list=c("n", "x", "a0", "a1", "sigma.e", "y"))
print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d)
+
+ ## now make a few NA's
+
+ tmp.d$x[1:2] <- rep(NA, 2)
Error: syntax error
Error during wrapup:
## this fails, just as it should
print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d)
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
## these also fail, but should not?
print(lm.2 <- lm(y ~ poly(x, 2), data = tmp.d, subset =
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, subset = !is.na(x))
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
print(lm.3 <- lm(y ~ poly(x, 2), data = tmp.d, na.action =
na.omit))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, na.action = na.omit)
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
## but this works
print(lm.3 <- lm(y ~ poly(x, 2), data = subset(tmp.d, subset =
!is.na(x))))
Call:
lm(formula = y ~ poly(x, 2), data = subset(tmp.d, subset =
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
--------------------
The documentation of lm is *not* misleading at this point,
subset an optional vector specifying a subset of
observations to be
used in the fitting process.
which implies that data are subsetted once lm.fit is called.
All the same, this behavior is a little unexpected to me.
Is it to be considered a feature, that is, does it produce
side effects which explain why it works as it does?
Regards,
Markus
I am running R on a Debian testing system with kernel 2.6.10 and
_
platform i386-pc-linux-gnu
arch i386
os linux-gnu
system i386, linux-gnu
status
major 2
minor 0.1
year 2004
month 11
day 15
language R
--
Markus Jantti
Abo Akademi University
markus.jantti@iki.fi
http://www.iki.fi/~mjantti