confidence interval or error of x intercept of a linear regression - R-help

Mon, Mar 23, 2009 8:31 PM #

Hello all,

This is something that I am sure has a really suave solution in R, but I can't quite figure out the best (or even a basic) way to do it.

I have a simple linear regression that is fit with lm for which I would like to estimate the x intercept with some measure of error around it (confidence interval).  In biology, there is the concept of a developmental zero - a temperature under which development will not happen. This is often estimated by extrapolation of a curve of developmental rate as a function of temperature.  This developmental zero is generally reported without error.  I intend to change this!  There has to be some way to assign error to this term, I just have yet to figure it out.

Now, it is simple enough to calculate the x-intercept itself ( - intercept / slope ), but it is a whole separate process to generate the confidence interval of it.  I can't figure out how to propagate the error of the slope and intercept into the ratio of the two.  The option option I have tried to figure out is to use the predict function to look for where the confidence intervals cross the axis but this hasn't been too fruitful either.  

I would greatly appreciate any insight you may be able to share.

Cheers,
Kevin

Here is a small representative sample of some of my data where Dev.Rate ~ Temperature.

t <-
structure(list(Temperature = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 10.5, 
10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 
10.5, 10.5, 10.5, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 
12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 14, 16, 16, 16, 16, 16, 
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 18, 18, 
18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
20, 20, 22, 22), Dev.Rate = c(0.007518797, 0.007518797, 0.007518797, 
0.007194245, 0.007194245, 0.007194245, 0.007194245, 0.007194245, 
0.007194245, 0.006896552, 0.006896552, 0.012820513, 0.012820513, 
0.012195122, 0.012195122, 0.012195122, 0.012195122, 0.011363636, 
0.011363636, 0.011363636, 0.011363636, 0.011363636, 0.011363636, 
0.011363636, 0.010869565, 0.00952381, 0.00952381, 0.015151515, 
0.015151515, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 
0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 
0.022727273, 0.022727273, 0.022727273, 0.020833333, 0.020833333, 
0.020833333, 0.034482759, 0.029411765, 0.029411765, 0.029411765, 
0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 
0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 
0.029411765, 0.029411765, 0.027027027, 0.025, 0.038461538, 0.03030303, 
0.03030303, 0.03030303, 0.052631579, 0.052631579, 0.045454545, 
0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 
0.045454545, 0.045454545, 0.045454545, 0.038461538, 0.038461538, 
0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.038461538, 
0.038461538, 0.047619048, 0.047619048, 0.047619048, 0.047619048, 
0.047619048, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 
0.0625, 0.0625, 0.0625, 0.0625, 0.052631579, 0.052631579, 0.052631579, 
0.052631579, 0.052631579, 0.076923077, 0.071428571)), .Names = c("Temperature", 
"Dev.Rate"), class = "data.frame", row.names = c(NA, 107L))



--
==========
==========
Kevin J Emerson
Bradshaw-Holzapfel Lab
Center for Ecology and Evolutionary Biology
1210 University of Oregon
Eugene, Oregon 97403
kemerson at uoregon.edu

Thomas Lumley

Tue, Mar 24, 2009 1:15 AM #

On Mon, 23 Mar 2009, Kevin J Emerson wrote:

A few people have written code to do nonlinear transformations automatically. 
There's one in my 'survey' package, and I think there's an example in Venables 
& Ripley's "S Programming", and I'm sure there are others.

For example

nlcon     SE
contrast 3.8061  0.1975

The `backquotes` are necessary because the coefficient name, (Intercept), isn't a legal variable name.

The svycontrast() function works on anything that has coef() and vcov() methods. The actual delta-method computations are in survey:::nlcon

        -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Thomas Lumley

Tue, Mar 24, 2009 1:18 AM #

I should probably also add the health warning that the delta-method approach to ratios of coefficients is REALLY BAD unless the denominator is well away from zero.

When the denominator is near zero nothing really works (Fieller's method is valid but only because 'valid' means less than you might like for a confidence interval method, and the bootstrap is useful but not necessarily valid)

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

(Ted Harding)

Tue, Mar 24, 2009 3:04 AM #

On 24-Mar-09 03:31:32, Kevin J Emerson wrote:

The difficulty with the situation you describe is that you are up
against what is known as the "Calibration Problem" in Statistics.
Essentially, the difficulty is that in the theory of the linear
regression there is sufficient probability for the slope to be
very close to zero -- and hence that the "X-intercept" maybe
very far out to the left or the right -- that the whole dconcept
of Standard error ceases to exist. Even the expected position of
the intercept does not exist. And this is true regardless of the
data (the one exception would be where it is absolutely certain
that the data will lie exactly on a straight line).

This has been addressed a number of times, and controversally,
in the statistical literature. One approach which I like (having
thought of it myself :)) is the use of "likelihood-based confidence
intervals".

Given a linear regression

  Y = a + b*X + E

(where the error E has N(0, sig^2) distribution), suppose you
want to estimate the value X0 of X for which Y = Y0, a given value
(in your case Y0 = 0). For simplicity, measure X and Y from their
means sum(X)/N and sum(Y)/N.

The approach is based on comparing the likelihoods

[A] for unresticted ML estimation of a, b and sig
    --> a.hat, b.hat, sig.hat
    (where a.hat = 0 because of the origins of X and Y)
[B] for ML estimation assuming a particular value X1 for X0
    --> a.tilde, b.tilde, sig.tilde

where
[A] a.hat     = 0 (as above),
    b.hat     = (sum(X*Y))/(sum(X^2)
    X0.hat    = Y0/b.hat [ = -mean(Y)/b.hat in your case ]
    sig.hat^2 = (sum(Y - b.hat*X)^2)/(N+1)

[B] a.tilde         = (sum(X^2) - X0*sum(X*Y))/D
    b.tilde         = ((N+1)*sum(X*Y) + N*X1*Y0)/D
    sig.hat.tilde^2
    = (sum((Y - a.tilde - b.tilde*X)^2)
        + (Y0 - a.tilde - b.tilde*X1)^2)/(N+1)

where D = (N+1)*sum(X^2 + N*X1^2

Then let R(X1) = (sig.hat^2)/(sig.tilde^2)

A confidence interval for X0 is the set of values of X1 such
that R(X1) > R0 say, where Prob(R(X0)>R0) = P, the desired
confidence level, when X0 is the true value.

It can be established that

  (N-2)*(1 - R(X0))/R(X0)

has the F distribution with 1 and (N-1) degrees of freedom.
Since this is independent of X0, the resulting set of values
of X1 constitute a confidence interval.

This was described in

  Harding, E.F. (1986). Modelling: the classical approach.
  The Statistician vol. 35, pp. 115-134
  (see pages 123-126)

and has been later referred to by P.J. Brown (thought I don't
have the references to hand just now).

When I have time for it (not today) I'll see if I can implement
this neatly in R. It's basically a question of solving

  (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1))

for X0 (two solutions, maybe one, if any exist). However, it is
quite possible that no solution exists (depending on P), so that
the confidence interval would then be (-inf,+inf); or, when only
one exists, it will be either (-inf,X0) or (X0,inf).
These possibilities of infinite intervals are directly related
to the possibility that, at significance level alpha = (1-P),
the estimated slope may not differ significantly from 0.

Maybe more later!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Mar-09                                       Time: 10:04:30
------------------------------ XFMail ------------------------------

Peter Dalgaard

Tue, Mar 24, 2009 9:16 AM #

(Ted Harding) wrote:

...

<etc.>

A quick and probably not-too-dirty way is to rewrite it as a nonlinear
model:

x <- 1:10
Y <- 2*x - 3 + rnorm(10, sd=.1)
cf <- coef(lm(Y~x))
confint(nls(Y~beta*(x-x0), start=c(beta=cf[[2]],x0=-cf[[1]]/cf[[2]])))

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Kevin J Emerson

Tue, Mar 24, 2009 7:15 PM #

I wanted to send out a quick thanks to all that replied to my query about estimating the confidence interval around the x-intercept of a linear regression.  The method I was able to implement in the most straightforward way was taken from Section 3.2 of Draper and Smith (1998). Applied Regression Analysis. The code follows - its not the most fancy code, but it gets the job done.  I will see if I can work out the other suggestions that were made and see how they all turn out.  

xInterceptCI <- function(x, alpha = 0.05, ...) {
  intercept <- coef(x)[1]              
  slope <- coef(x)[2]                  
  meanX <- mean(x$model[,2])           
  n <- length(x$model[,2])             
  tstar <- qt(alpha/2, n-2)            
  sxx <- sum(x$model[,2]^2) - sum(x$model[,2])^2 / n
  SSresidual <- (1-cor(x$model[,1], x$model[,2])^2) * 
                (sum(x$model[,1]^2)-sum(x$model[,1])^2/n)
  S <- sqrt(SSresidual/(n-2))
  SEslope <- S / sqrt(sxx)
  Xintercept <- - intercept / slope
  y0 <- 0
  g <- (tstar / (slope/SEslope))^2
  left <- (Xintercept - meanX) * g
  bottom <- 1 - g
  Right <- (tstar * S / slope) * sqrt( ((Xintercept - meanX)^2/sxx) + bottom/n)
  lower <- Xintercept + (left + Right) / bottom
  upper <- Xintercept + (left - Right) / bottom
  return(c(lower,upper))
}

On Tue, 24 Mar 2009 17:16:41 +0100, Peter Dalgaard <P.Dalgaard at biostat.ku.dk> wrote:

--
==========
==========
Kevin J Emerson
Bradshaw-Holzapfel Lab
Center for Ecology and Evolutionary Biology
1210 University of Oregon
Eugene, Oregon 97403
kemerson at uoregon.edu