Hello all,
This is something that I am sure has a really suave solution in R, but I can't quite figure out the best (or even a basic) way to do it.
I have a simple linear regression that is fit with lm for which I would like to estimate the x intercept with some measure of error around it (confidence interval). In biology, there is the concept of a developmental zero - a temperature under which development will not happen. This is often estimated by extrapolation of a curve of developmental rate as a function of temperature. This developmental zero is generally reported without error. I intend to change this! There has to be some way to assign error to this term, I just have yet to figure it out.
Now, it is simple enough to calculate the x-intercept itself ( - intercept / slope ), but it is a whole separate process to generate the confidence interval of it. I can't figure out how to propagate the error of the slope and intercept into the ratio of the two. The option option I have tried to figure out is to use the predict function to look for where the confidence intervals cross the axis but this hasn't been too fruitful either.
I would greatly appreciate any insight you may be able to share.
Cheers,
Kevin
Here is a small representative sample of some of my data where Dev.Rate ~ Temperature.
t <-
structure(list(Temperature = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 10.5,
10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5,
10.5, 10.5, 10.5, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 14, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 18, 18,
18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 22, 22), Dev.Rate = c(0.007518797, 0.007518797, 0.007518797,
0.007194245, 0.007194245, 0.007194245, 0.007194245, 0.007194245,
0.007194245, 0.006896552, 0.006896552, 0.012820513, 0.012820513,
0.012195122, 0.012195122, 0.012195122, 0.012195122, 0.011363636,
0.011363636, 0.011363636, 0.011363636, 0.011363636, 0.011363636,
0.011363636, 0.010869565, 0.00952381, 0.00952381, 0.015151515,
0.015151515, 0.022727273, 0.022727273, 0.022727273, 0.022727273,
0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273,
0.022727273, 0.022727273, 0.022727273, 0.020833333, 0.020833333,
0.020833333, 0.034482759, 0.029411765, 0.029411765, 0.029411765,
0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765,
0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765,
0.029411765, 0.029411765, 0.027027027, 0.025, 0.038461538, 0.03030303,
0.03030303, 0.03030303, 0.052631579, 0.052631579, 0.045454545,
0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545,
0.045454545, 0.045454545, 0.045454545, 0.038461538, 0.038461538,
0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.038461538,
0.038461538, 0.047619048, 0.047619048, 0.047619048, 0.047619048,
0.047619048, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625,
0.0625, 0.0625, 0.0625, 0.0625, 0.052631579, 0.052631579, 0.052631579,
0.052631579, 0.052631579, 0.076923077, 0.071428571)), .Names = c("Temperature",
"Dev.Rate"), class = "data.frame", row.names = c(NA, 107L))
--
==========
==========
Kevin J Emerson
Bradshaw-Holzapfel Lab
Center for Ecology and Evolutionary Biology
1210 University of Oregon
Eugene, Oregon 97403
kemerson at uoregon.edu
confidence interval or error of x intercept of a linear regression
6 messages · Thomas Lumley, (Ted Harding), Peter Dalgaard +1 more
On Mon, 23 Mar 2009, Kevin J Emerson wrote:
Now, it is simple enough to calculate the x-intercept itself ( - intercept / slope ), but it is a whole separate process to generate the confidence interval of it. I can't figure out how to propagate the error of the slope and intercept into the ratio of the two. The option option I have tried to figure out is to use the predict function to look for where the confidence intervals cross the axis but this hasn't been too fruitful either.
A few people have written code to do nonlinear transformations automatically. There's one in my 'survey' package, and I think there's an example in Venables & Ripley's "S Programming", and I'm sure there are others. For example
library(survey) m<-lm(Dev.Rate~Temperature, data=t) svycontrast(m, quote(-`(Intercept)`/Temperature))
nlcon SE
contrast 3.8061 0.1975
The `backquotes` are necessary because the coefficient name, (Intercept), isn't a legal variable name.
The svycontrast() function works on anything that has coef() and vcov() methods. The actual delta-method computations are in survey:::nlcon
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
I should probably also add the health warning that the delta-method approach to ratios of coefficients is REALLY BAD unless the denominator is well away from zero.
When the denominator is near zero nothing really works (Fieller's method is valid but only because 'valid' means less than you might like for a confidence interval method, and the bootstrap is useful but not necessarily valid)
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
On 24-Mar-09 03:31:32, Kevin J Emerson wrote:
Hello all, This is something that I am sure has a really suave solution in R, but I can't quite figure out the best (or even a basic) way to do it. I have a simple linear regression that is fit with lm for which I would like to estimate the x intercept with some measure of error around it (confidence interval). In biology, there is the concept of a developmental zero - a temperature under which development will not happen. This is often estimated by extrapolation of a curve of developmental rate as a function of temperature. This developmental zero is generally reported without error. I intend to change this! There has to be some way to assign error to this term, I just have yet to figure it out. Now, it is simple enough to calculate the x-intercept itself ( - intercept / slope ), but it is a whole separate process to generate the confidence interval of it. I can't figure out how to propagate the error of the slope and intercept into the ratio of the two. The option option I have tried to figure out is to use the predict function to look for where the confidence intervals cross the axis but this hasn't been too fruitful either. I would greatly appreciate any insight you may be able to share. Cheers, Kevin Here is a small representative sample of some of my data where Dev.Rate ~ Temperature.
The difficulty with the situation you describe is that you are up
against what is known as the "Calibration Problem" in Statistics.
Essentially, the difficulty is that in the theory of the linear
regression there is sufficient probability for the slope to be
very close to zero -- and hence that the "X-intercept" maybe
very far out to the left or the right -- that the whole dconcept
of Standard error ceases to exist. Even the expected position of
the intercept does not exist. And this is true regardless of the
data (the one exception would be where it is absolutely certain
that the data will lie exactly on a straight line).
This has been addressed a number of times, and controversally,
in the statistical literature. One approach which I like (having
thought of it myself :)) is the use of "likelihood-based confidence
intervals".
Given a linear regression
Y = a + b*X + E
(where the error E has N(0, sig^2) distribution), suppose you
want to estimate the value X0 of X for which Y = Y0, a given value
(in your case Y0 = 0). For simplicity, measure X and Y from their
means sum(X)/N and sum(Y)/N.
The approach is based on comparing the likelihoods
[A] for unresticted ML estimation of a, b and sig
--> a.hat, b.hat, sig.hat
(where a.hat = 0 because of the origins of X and Y)
[B] for ML estimation assuming a particular value X1 for X0
--> a.tilde, b.tilde, sig.tilde
where
[A] a.hat = 0 (as above),
b.hat = (sum(X*Y))/(sum(X^2)
X0.hat = Y0/b.hat [ = -mean(Y)/b.hat in your case ]
sig.hat^2 = (sum(Y - b.hat*X)^2)/(N+1)
[B] a.tilde = (sum(X^2) - X0*sum(X*Y))/D
b.tilde = ((N+1)*sum(X*Y) + N*X1*Y0)/D
sig.hat.tilde^2
= (sum((Y - a.tilde - b.tilde*X)^2)
+ (Y0 - a.tilde - b.tilde*X1)^2)/(N+1)
where D = (N+1)*sum(X^2 + N*X1^2
Then let R(X1) = (sig.hat^2)/(sig.tilde^2)
A confidence interval for X0 is the set of values of X1 such
that R(X1) > R0 say, where Prob(R(X0)>R0) = P, the desired
confidence level, when X0 is the true value.
It can be established that
(N-2)*(1 - R(X0))/R(X0)
has the F distribution with 1 and (N-1) degrees of freedom.
Since this is independent of X0, the resulting set of values
of X1 constitute a confidence interval.
This was described in
Harding, E.F. (1986). Modelling: the classical approach.
The Statistician vol. 35, pp. 115-134
(see pages 123-126)
and has been later referred to by P.J. Brown (thought I don't
have the references to hand just now).
When I have time for it (not today) I'll see if I can implement
this neatly in R. It's basically a question of solving
(N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1))
for X0 (two solutions, maybe one, if any exist). However, it is
quite possible that no solution exists (depending on P), so that
the confidence interval would then be (-inf,+inf); or, when only
one exists, it will be either (-inf,X0) or (X0,inf).
These possibilities of infinite intervals are directly related
to the possibility that, at significance level alpha = (1-P),
the estimated slope may not differ significantly from 0.
Maybe more later!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Mar-09 Time: 10:04:30
------------------------------ XFMail ------------------------------
(Ted Harding) wrote:
On 24-Mar-09 03:31:32, Kevin J Emerson wrote:
...
When I have time for it (not today) I'll see if I can implement this neatly in R. It's basically a question of solving (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1)) for X0 (two solutions, maybe one, if any exist).
<etc.> A quick and probably not-too-dirty way is to rewrite it as a nonlinear model: x <- 1:10 Y <- 2*x - 3 + rnorm(10, sd=.1) cf <- coef(lm(Y~x)) confint(nls(Y~beta*(x-x0), start=c(beta=cf[[2]],x0=-cf[[1]]/cf[[2]])))
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
I wanted to send out a quick thanks to all that replied to my query about estimating the confidence interval around the x-intercept of a linear regression. The method I was able to implement in the most straightforward way was taken from Section 3.2 of Draper and Smith (1998). Applied Regression Analysis. The code follows - its not the most fancy code, but it gets the job done. I will see if I can work out the other suggestions that were made and see how they all turn out.
xInterceptCI <- function(x, alpha = 0.05, ...) {
intercept <- coef(x)[1]
slope <- coef(x)[2]
meanX <- mean(x$model[,2])
n <- length(x$model[,2])
tstar <- qt(alpha/2, n-2)
sxx <- sum(x$model[,2]^2) - sum(x$model[,2])^2 / n
SSresidual <- (1-cor(x$model[,1], x$model[,2])^2) *
(sum(x$model[,1]^2)-sum(x$model[,1])^2/n)
S <- sqrt(SSresidual/(n-2))
SEslope <- S / sqrt(sxx)
Xintercept <- - intercept / slope
y0 <- 0
g <- (tstar / (slope/SEslope))^2
left <- (Xintercept - meanX) * g
bottom <- 1 - g
Right <- (tstar * S / slope) * sqrt( ((Xintercept - meanX)^2/sxx) + bottom/n)
lower <- Xintercept + (left + Right) / bottom
upper <- Xintercept + (left - Right) / bottom
return(c(lower,upper))
}
On Tue, 24 Mar 2009 17:16:41 +0100, Peter Dalgaard <P.Dalgaard at biostat.ku.dk> wrote:
(Ted Harding) wrote:
On 24-Mar-09 03:31:32, Kevin J Emerson wrote:
...
When I have time for it (not today) I'll see if I can implement this neatly in R. It's basically a question of solving (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1)) for X0 (two solutions, maybe one, if any exist).
<etc.> A quick and probably not-too-dirty way is to rewrite it as a nonlinear model: x <- 1:10 Y <- 2*x - 3 + rnorm(10, sd=.1) cf <- coef(lm(Y~x)) confint(nls(Y~beta*(x-x0), start=c(beta=cf[[2]],x0=-cf[[1]]/cf[[2]]))) -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
-- ========== ========== Kevin J Emerson Bradshaw-Holzapfel Lab Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, Oregon 97403 kemerson at uoregon.edu