confidence interval in "predict.lm" - R-help

Fri, Nov 15, 2002 8:43 AM #

I am studying statistics using R and a book "Understandable Statistics", by
Brase and Brase.  The book has two
worked examples for calculating a confidence interval around a predicted
value from a linear model.  The answers
to the two examples in the book differ from those I get from R.  The
regression line, the standard error, and the
predicted value in
R and the book all agree for the examples.  Hence I gather that R and the
book use different formula to calculate
the confidence interval.  Could someone explain why the difference exists,
and which function(s) in R I might use
to get the answers in the book, and (perhaps) an explanation as to which
method to use in various situations).

The example:

temp amnt
1   10   17
2   20   21
3   30   25
4   40   28
5   50   33
6   60   40
7   70   49

being a table of temperatures (temp) and the corresponding amounts of copper
sulfate that disolve in 100g of water
at that temperature.

The regression line:

Call:
lm(formula = amnt ~ temp, data = dat)

Residuals:
      1       2       3       4       5       6       7
 1.7857  0.7143 -0.3571 -2.4286 -2.5000 -0.5714  3.3571

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.14286    1.98463   5.111  0.00374 **
temp         0.50714    0.04438  11.428 8.98e-05 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 2.348 on 5 degrees of freedom
Multiple R-Squared: 0.9631,     Adjusted R-squared: 0.9558
F-statistic: 130.6 on 1 and 5 DF,  p-value: 8.985e-05

The .95 confidence interval for a temperature of 45 degrees:

foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fit=
T)

$fit
          fit      lwr      upr
[1,] 32.96429 30.61253 35.31604

$se.fit
[1] 0.9148715

$df
[1] 5

$residual.scale
[1] 2.348252

The book gives the confidence interval as 26.5 <= y <= 39.5.  The book
defines the confidence interval calculation thus:

  yp - E <= y <= yp + E

  Where
   E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx)
   yp is the predicted value from the regression line
   tc is the value from Student's t distribution for a confidence
    level, c, using n-2 degrees of freedom,
   sC is the standard error of estimate
   SSx is Sum(x^2)-[Sum(x)]^2/n
   n is the number of data pairs.

So that even though the model, predicted value, standard error all agree, R
gives a much smaller confidence
interval than the book does.

Thanks for any advice/help.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Martin Maechler

Fri, Nov 15, 2002 9:42 AM #

You are looking for some (most?) statisticians call
 ``prediction interval''

==> just give "prediction" instead of "confidence" :

+         interval = "prediction", se.fit = TRUE)

$fit
          fit      lwr     upr
[1,] 32.96429 26.48597 39.4426

$se.fit
[1] 0.9148715

$df
[1] 5

$residual.scale
[1] 2.348252

Fred> I am studying statistics using R and a book
    Fred> "Understandable Statistics", by Brase and Brase.  The
    Fred> book has two worked examples for calculating a
    Fred> confidence interval around a predicted value from a
    Fred> linear model.  The answers to the two examples in the
    Fred> book differ from those I get from R.  The regression
    Fred> line, the standard error, and the predicted value in R
    Fred> and the book all agree for the examples.  Hence I
    Fred> gather that R and the book use different formula to
    Fred> calculate the confidence interval.  Could someone
    Fred> explain why the difference exists, and which
    Fred> function(s) in R I might use to get the answers in the
    Fred> book, and (perhaps) an explanation as to which method
    Fred> to use in various situations).

    Fred> The example:

    >> x<-c(10,20,30,40,50,60,70) y<-c(17,21,25,28,33,40,49) dat
    >> <- data.frame(temp=x,amnt=y)
    Fred>   temp amnt 1 10 17 2 20 21 3 30 25 4 40 28 5 50 33 6
    Fred> 60 40 7 70 49

    Fred> being a table of temperatures (temp) and the
    Fred> corresponding amounts of copper sulfate that disolve
    Fred> in 100g of water at that temperature.

    Fred> The regression line:

    >> mod <- lm(amnt ~ temp,dat) summary(mod)

    Fred> Call: lm(formula = amnt ~ temp, data = dat)

    Fred> Residuals: 1 2 3 4 5 6 7 1.7857 0.7143 -0.3571 -2.4286
    Fred> -2.5000 -0.5714 3.3571

    Fred> Coefficients: Estimate Std. Error t value Pr(>|t|)
    Fred> (Intercept) 10.14286 1.98463 5.111 0.00374 ** temp
    Fred> 0.50714 0.04438 11.428 8.98e-05 *** --- Signif. codes:
    Fred> 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

    Fred> Residual standard error: 2.348 on 5 degrees of freedom
    Fred> Multiple R-Squared: 0.9631, Adjusted R-squared: 0.9558
    Fred> F-statistic: 130.6 on 1 and 5 DF, p-value: 8.985e-05

    Fred> The .95 confidence interval for a temperature of 45
    Fred> degrees:
    >>
    Fred> foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fit=
    Fred> T)
    >> foo
    Fred> $fit fit lwr upr [1,] 32.96429 30.61253 35.31604

    Fred> $se.fit [1] 0.9148715

    Fred> $df [1] 5

    Fred> $residual.scale [1] 2.348252

    Fred> The book gives the confidence interval as 26.5 <= y <=
    Fred> 39.5.  The book defines the confidence interval
    Fred> calculation thus:

    Fred>   yp - E <= y <= yp + E

    Fred>   Where E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx) yp
    Fred> is the predicted value from the regression line tc is
    Fred> the value from Student's t distribution for a
    Fred> confidence level, c, using n-2 degrees of freedom, sC
    Fred> is the standard error of estimate SSx is
    Fred> Sum(x^2)-[Sum(x)]^2/n n is the number of data pairs.

    Fred> So that even though the model, predicted value,
    Fred> standard error all agree, R gives a much smaller
    Fred> confidence interval than the book does.

    Fred> Thanks for any advice/help.

    Fred> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    Fred> r-help mailing list -- Read
    Fred> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send
    Fred> "info", "help", or "[un]subscribe" (in the "body", not
    Fred> the subject !)  To: r-help-request at stat.math.ethz.ch
    Fred> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

Fri, Nov 15, 2002 10:17 AM #

On Fri, 15 Nov 2002, Fred Mellender wrote:

<snip>

You asked R for a confidence interval for the predicted mean at x.  If you
want a prediction interval at x you need  interval="prediction" not
interval="confidence".

	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Fri, Nov 15, 2002 10:18 AM #

"Fred Mellender" <fredm at frontiernet.net> writes:

The book is giving you a prediction interval, aka a tolerance
interval. Some people use the term "confidence interval" a bit too
sloppily. predict() will give you the other kind of interval if you
ask it to. Vice versa, 

E = tc*sC *sqrt(1/n + (x-xBar)^2/SSx) 

would give you the confidence interval for the predicted mean, I think.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Fri, Nov 15, 2002 10:33 AM #

The problem is with your quote from the book.  That formula is not a
confidence interval, it is a tolerance interval, Use predict.lm with
interval="prediction" to get it.

I suggest you get a better (or at least more understandable) book!

On Fri, 15 Nov 2002, Fred Mellender wrote:

I am studying statistics using R and a book "Understandable Statistics", by
Brase and Brase.  The book has two
worked examples for calculating a confidence interval around a predicted
value from a linear model.  The answers
to the two examples in the book differ from those I get from R.  The
regression line, the standard error, and the
predicted value in
R and the book all agree for the examples.  Hence I gather that R and the
book use different formula to calculate
the confidence interval.  Could someone explain why the difference exists,
and which function(s) in R I might use
to get the answers in the book, and (perhaps) an explanation as to which
method to use in various situations).

The example:

x<-c(10,20,30,40,50,60,70)
y<-c(17,21,25,28,33,40,49)
dat <- data.frame(temp=x,amnt=y)

  temp amnt
1   10   17
2   20   21
3   30   25
4   40   28
5   50   33
6   60   40
7   70   49

being a table of temperatures (temp) and the corresponding amounts of copper
sulfate that disolve in 100g of water
at that temperature.

The regression line:

mod <- lm(amnt ~ temp,dat)
summary(mod)

Call:
lm(formula = amnt ~ temp, data = dat)

Residuals:
      1       2       3       4       5       6       7
 1.7857  0.7143 -0.3571 -2.4286 -2.5000 -0.5714  3.3571

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.14286    1.98463   5.111  0.00374 **
temp         0.50714    0.04438  11.428 8.98e-05 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 2.348 on 5 degrees of freedom
Multiple R-Squared: 0.9631,     Adjusted R-squared: 0.9558
F-statistic: 130.6 on 1 and 5 DF,  p-value: 8.985e-05

The .95 confidence interval for a temperature of 45 degrees:

foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fit=
T)

foo

$fit
          fit      lwr      upr
[1,] 32.96429 30.61253 35.31604

$se.fit
[1] 0.9148715

$df
[1] 5

$residual.scale
[1] 2.348252

The book gives the confidence interval as 26.5 <= y <= 39.5.  The book
defines the confidence interval calculation thus:

  yp - E <= y <= yp + E

  Where
   E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx)
   yp is the predicted value from the regression line
   tc is the value from Student's t distribution for a confidence
    level, c, using n-2 degrees of freedom,
   sC is the standard error of estimate
   SSx is Sum(x^2)-[Sum(x)]^2/n
   n is the number of data pairs.

So that even though the model, predicted value, standard error all agree, R
gives a much smaller confidence
interval than the book does.

Thanks for any advice/help.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Matej Cepl

Fri, Nov 15, 2002 10:57 AM #

Thomas Lumley wrote:

Slightly OT question to this thread:

How can I get critical values for given distribution density?
E.g. function f which would give me 2.228 for ft(p=0.05,df=10)
(i.e., t for student distribution with given level probability).

Sorry for newbie question.

Matej


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Fri, Nov 15, 2002 11:20 AM #

Matej Cepl <matej at ceplovi.cz> writes:

[1] 2.228139

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._