Hi Rolf,
BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px; }
Thanks very much for your response. You are right - my simulated
example works as intended, so it can't be used to get to the bottom of
this problem (if it is a problem at all).
Here is another example, which is the one I actually worked with when
I thought maybe something is not quite right in the universe.
The example is based on a real data set (please keep it
confidential), which is attached to this e-mail as mod.data.csv.
This data set includes terminal run numbers for salmon, recorded at
Age_5, Age_4, Age_3 and Age_2. A model of the form lm(Age_5 ~ 0 +
Age_4 + Age_3 + Age_2) is fitted to these data and the goal is to
visualize the effects of Age_4, Age_3 and Age_2 on Age_5. For
biological reasons, this model is supposed to not have an intercept.
The attachment Effect_1.pdf shows what these effects look like. If
the model has no intercept, should the confidence bands still flare up
as one moves away from the value of the predictor whose effect we care
about?
The attachment Effect_2.pdf replicates the effects plots but this
time using the effects package.
If predict() is correct, should we expect from statistical theory to
see that the confidence bands have this particular behaviour?
Intuitively, I would have expected them to look like a fan plot that
starts out at zero and then flares up as we move away from zero.
Here is the R code I used to create the two attached plots (with R
x64 3.1.0). In this code, Age_5 becomes y, Age_4 becomes x, Age_3
becomes z and Age_2 becomes v.
## read mod.data into R
mod.data single predictor and no intercept using R code:
>
> ## simulate a response y and two predictors x and z
>
> x
> z
> y
>
> ## fit a linear model with no intercept but with one predictor
>
> mod
> ## compute confidence bands (i.e., fitted values +/- 1.96 standard
errors of fitted values)
>
> conf.band.x interval="confidence")
>
> ## display confidence bands
>
> conf.band.x fit=conf.band.x[,"fit"],
> upr=conf.band.x[,"upr"])
>
> matplot(x=seq(from=ceiling(min(x)),to=floor(max(x)),by=0.01),
y=conf.band.x, type="l", xlab="x", ylab="y")
> abline(v=mean(x),lty=3,col="magenta")
> title("Effect of x on y")
>
> According to statistical theory, in a model with no intercept and
one predictor, the standard errors should be directly
> proportional to the value of x at which they are evaluated. If x=0,
the standard errors should also be zero. If x increases,
> the standard errors should also increase. The resulting plot
produced by matplot shows this is not the case - the standard
> errors appear to increase as one moves away from the average value
of x. We would expect this behaviour if the model included
> an intercept, which is not the case here.
>
> Here is some R code for looking at standard errors of fitted values
when the model includes no intercept and two predictors x
> and z. In this code, the value of the predictor z is set to its
average level.
> ## linear model with no intercept but with two predictors
>
> mod
> conf.band.x z = mean(z)),
> interval="confidence")
>
> conf.band.x fit=conf.band.x[,"fit"],
> upr=conf.band.x[,"upr"])
>
> matplot(x=seq(from=ceiling(min(x)),to=floor(max(x)),by=0.01),
y=conf.band.x, type="l", xlab="x", ylab="y")
> abline(v=mean(x),lty=3,col="magenta")
> title("Partial Effect of x on y (obtained by setting z to its
average level)")
>
> Again, the standard errors seem to behave as though they would come
from a model with an intercept (given that they are
> flaring up as one moves away from the average value of the
predictor x).
>
> I would very much appreciate any clarifications or suggestions for
how to fix this problem.
>
> If the problem is confirmed, it appears to also carry over to the
effects package in R, which constructs plots similar to the
> ones produced by matplot above by relying on the predict()
function.
--
Rolf Turner
Technical Editor ANZJS
Error in predict.lm() for models without intercept?
3 messages · isabella at ghement.ca, Rolf Turner, Peter Dalgaard
What's going on is very simple. I apologize for not getting my thoughts
organized from the start.
Confidence bands for mu_y (the expected value of y) will have straight
line edges if and only if there is a *single predictor* in the model.
If there are two or more predictors the edges will be curved. It
doesn't really matter if one of the predictors is constant or not.
The standard error of the estimate of mu_y is
s * sqrt(t(x)%*%H%*%x)
where x is the value of the covariates at which y is being predicted and
H = (t(X)%*%X)^{-1} --- where in turn X is the design matrix.
It there is a single predictor "x" then X is n x 1 and H is 1 x 1, i.e.
a scalar, say h. Thus the standard error is
s*sqrt(h)*|x|
and you get the proportionality to the absolute value of x.
If there are two predictors, x_1 and x_2 then the standard error has the
form
s*sqrt(a*x_1^2 + 2*b*x_1*x_2 + c*x_2^2)
i.e. s times the square root of a quadratic form. This will *not* be a
straight line function of x_1 when x_2 is held constant, and vice versa.
Thus your expectation of seeing straight line edges to confidence bands
was misguided. This happens only when there is just *one* predictor.
Again I'm sorry I didn't get my head together about this earlier and
save you sending me your data set. The issue has nothing whatever to do
with your particular data set --- it's just basic mathematics.
There is nothing at all wrong with predict.lm().
cheers,
Rolf
P.S. This is not an R matter at all, so if you have any further
questions you should email me off-list.
R.
On 17/09/14 15:58, isabella at ghement.ca wrote:
Hi Rolf,
Thanks very much for your response. You are right - my simulated
example works as intended, so it can't be used to get to the bottom of
this problem (if it is a problem at all).
Here is another example, which is the one I actually worked with when I
thought maybe something is not quite right in the universe.
The example is based on a real data set (please keep it confidential),
which is attached to this e-mail as *mod.data.csv*. This data set
includes terminal run numbers for salmon, recorded at Age_5, Age_4,
Age_3 and Age_2. A model of the form lm(Age_5 ~ 0 + Age_4 + Age_3 +
Age_2) is fitted to these data and the goal is to visualize the effects
of Age_4, Age_3 and Age_2 on Age_5. For biological reasons, this model
is supposed to not have an intercept.
The attachment Effect_1.pdf shows what these effects look like. If the
model has no intercept, should the confidence bands still flare up
as one moves away from the value of the predictor whose effect we care
about?
The attachment Effect_2.pdf replicates the effects plots but this time
using the effects package.
If predict() is correct, should we expect from statistical theory to see
that the confidence bands have this particular behaviour? Intuitively,
I would have expected them to look like a fan plot that starts out at
zero and then flares up as we move away from zero.
Here is the R code I used to create the two attached plots (with R x64
3.1.0). In this code, Age_5 becomes y, Age_4 becomes x, Age_3 becomes z
and Age_2 becomes v.
## read mod.data into R
mod.data <- read.csv("mod.data.csv")
## compute confidence bands (i.e., fitted values +/- 1.96 standard
errors of fitted values)
y <- mod.data$Age_5
x <- mod.data$Age_4
z <- mod.data$Age_3
v <- mod.data$Age_2
mod <- lm(y ~ 0 + x)
conf.band.x <- predict(mod,newdata=data.frame(x =
seq(from=0,to=max(x),by=100)),
interval="confidence")
## display confidence bands
conf.band.x <- data.frame(lwr=conf.band.x[,"lwr"],
fit=conf.band.x[,"fit"],
upr=conf.band.x[,"upr"])
matplot(x=seq(from=0,to=floor(max(x)),by=100), y=conf.band.x, type="l",
xlab="x", ylab="y")
## abline(v=0,lty=3,col="magenta")
title("Effect of x on y")
## linear model with no intercept but with three predictors
par(mfrow=c(2,2))
mod <- lm(y ~ 0 + x + z + v)
## effect of x on y
conf.band.x <- predict(mod,newdata=data.frame(x =
seq(from=0,to=max(x),by=100),
z = mean(z),
v=mean(v)),
interval="confidence")
conf.band.x <- data.frame(lwr=conf.band.x[,"lwr"],
fit=conf.band.x[,"fit"],
upr=conf.band.x[,"upr"])
matplot(x=seq(from=0,to=max(x),by=100), y=conf.band.x, type="l",
xlab="x", ylab="y")
abline(v=mean(x),lty=3,col="magenta")
title("Effect of x on y (obtained by setting z and v to their average
levels)")
## effect of z on y
conf.band.z <- predict(mod,newdata=data.frame(z = seq(0,to=max(z),by=100),
x = mean(x),v=mean(v)),
interval="confidence")
conf.band.z <- data.frame(lwr=conf.band.z[,"lwr"],
fit=conf.band.z[,"fit"],
upr=conf.band.z[,"upr"])
matplot(seq(from=0,to=max(z),by=100), y=conf.band.z, type="l", xlab="z",
ylab="y")
abline(v=mean(z),lty=3,col="magenta")
title("Effect of z on y (obtained by setting x and v to their average
levels)")
## effect of v on y
conf.band.v <- predict(mod,newdata=data.frame(v = seq(0,to=max(v),by=100),
x = mean(x),z=mean(z)),
interval="confidence")
conf.band.v <- data.frame(lwr=conf.band.v[,"lwr"],
fit=conf.band.v[,"fit"],
upr=conf.band.v[,"upr"])
matplot(seq(from=0,to=max(v),by=100), y=conf.band.v, type="l", xlab="v",
ylab="y")
abline(v=mean(v),lty=3,col="magenta")
title("Effect of v on y (obtained by setting x and z to their average
levels)")
###
### Replicate effect plots using effects package
###
require(effects)
plot(allEffects(mod))
Maybe my intuition is playing tricks on me - if you have any idea as to
what may be going on here, please let me know.
Rolf Turner Technical Editor ANZJS
On 17 Sep 2014, at 05:58 , isabella at ghement.ca wrote:
The example is based on a real data set (please keep it confidential), which is attached to this e-mail as mod.data.csv.
"Don't worry, it will stay between you two and the Internet...." (Actually, the list software seems to have scrubbed it, but that's probably only because your mailer sends with non-text mime-type.)
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com