Skip to content

Confidence Intervals on Standard Curve

6 messages · nzcoops, Ben Ward, David Winsemius

#
Hi, I wonder if anyone could advise me with this:

I've been trying to make a standard curve in R with lm() of some 
standards from a spectrophotometer, so as I can express the curve as a 
formula, and so obtain values from my treated samples by plugging in 
readings into the formula, instead of trying to judge things by eye, 
with a curve drawn by hand.

It is a curve and so I used the following formula:

model <- lm(Approximate.Counts~X..Light.Transmission + 
I(Approximate.Counts^2), data=Standards)

It gives me a pretty decent graph:
xyplot(Approximate.Counts + fitted(model) ~ X..Light.Transmission, 
data=Standards)

I'm pretty happy with it, and looking at the model summary, to my 
inexperienced eyes it seems pretty good:

lm(formula = Approximate.Counts ~ X..Light.Transmission + 
I(Approximate.Counts^2),
     data = Standards)

Residuals:
    Min     1Q Median     3Q    Max
-91.75 -51.04  27.33  37.28  49.72

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)              9.868e+02  2.614e+01   37.75 <2e-16 ***
X..Light.Transmission   -1.539e+01  8.116e-01  -18.96 <2e-16 ***
I(Approximate.Counts^2)  2.580e-04  6.182e-06   41.73 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 48.06 on 37 degrees of freedom
Multiple R-squared: 0.9956,    Adjusted R-squared: 0.9954
F-statistic:  4190 on 2 and 37 DF,  p-value: < 2.2e-16

I tried to put some 95% confidence interval lines on a plot, as advised 
by my tutor, to see how they looked, and I used a function I found in 
"The R Book":

se.lines <- function(model){
b1<-coef(model)[2]+ summary(model)[[4]][4]
b2<-coef(model)[2]- summary(model)[[4]][4]
xm<-mean(model[[12]][2])
ym<-mean(model[[12]][1])
a1<-ym-b1*xm
a2<-ym-b2*xm
abline(a1,b1,lty=2)
abline(a2,b2,lty=2)
}
se.lines(model)

but when I do this on a plot I get an odd result:


They looks to me, to lie in the same kind of area, that my regression 
line did, before I used polynomial regression, by squaring 
"Approximate.Counts":

lm(formula = Approximate.Counts ~ X..Light.Transmission + 
I(Approximate.Counts^2), data = Standards)

Is there something else I should be doing? I've seen several ways of 
dealing with non-linear relationships, from log's of certain variables, 
and quadratic regression, and using sin and other mathematical devices. 
I'm not completely sure if I'm "allowed" to square the y variable, the 
book only squared the x variable in quadratic regression, which I did 
first, and it fit quite well, but not as good squaring Approximate 
Counts does:

model <- lm(Approximate.Counts~X..Light.Transmission + 
I(X..Light.Transmission^2), data=Standards)


Any advice is greatly appreciated, it's the first time I've really had 
to look at regression with data in my coursework that isn't a straight line.

Thanks,
Ben Ward.
#
I've just realised the couple of graphs I put on here have been stripped 
off. If anyone has to see them and can't see my problem from code, I can 
send them directly to anyone who thinks they can help but wants to see them.

Thanks,
Ben W.
On 18/02/2011 23:29, Ben Ward wrote:
#
model <- lm(Approximate.Counts~X..Light.Transmission +
I(Approximate.Counts^2), data=Standards) 

Might not be addressing the problem, don't you have Y ~ X + Y^2 here? That's
a violation of the assumptions of an lm isn't it?

Also for plotting CI on a curve look into ggplot2::geom_ribbon, it's much
nicer than just plotting lines and is easy to use. had.co.nz should set you
right for setting this up.
#
It is, I tried a glm with a poisson distribution, as was suggested to me 
previously, but the Residual Deviance was too high - the book I'm 
reading says it suggests overdispersion because it's way above the 
Residual degrees of freedom:

glm(formula = Approximate.Counts ~ X..Light.Transmission, family = poisson,
     data = Standard)

Deviance Residuals:
    Min      1Q  Median      3Q     Max
-6.434  -2.822  -1.257   4.319   9.432

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)
(Intercept)            7.7584256  0.0058981  1315.4 <2e-16 ***
X..Light.Transmission -0.0497474  0.0004951  -100.5 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

     Null deviance: 15709.90  on 39  degrees of freedom
Residual deviance:   671.91  on 38  degrees of freedom
AIC: 1030.7

Number of Fisher Scoring iterations: 4

It suggested using quasi-poisson but that didn't alter anything, so then 
used a negative binomial and gained the result:

glm.nb(formula = Approximate.Counts ~ X..Light.Transmission,
     data = Standard, init.theta = 67.14797983, link = log)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-1.5281  -0.9208  -0.1883   0.9685   2.0121

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)            7.69797    0.02819  273.11 <2e-16 ***
X..Light.Transmission -0.04410    0.00141  -31.26 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(67.148) family taken to be 1)

     Null deviance: 905.35  on 39  degrees of freedom
Residual deviance:  40.39  on 38  degrees of freedom
AIC: 517.21

Number of Fisher Scoring iterations: 1


               Theta:  67.1
           Std. Err.:  16.1

  2 x log-likelihood:  -511.21

Which shows a much lower Residual Deviance, and AIC. I plot it and It is 
also pretty good. I'm yet to assess it with plot(model). This is my 
first proper go with GLM so I'm not confident at evaluating them, 
although I am aware that a better model of the data has lower AIC, and 
Residual Deviance should not be orders above the degrees of freedom.

However, the

Y ~ X + Y^2

Produces the best fitting line - it is pretty much on the data points - 
I'm trying to make a standard curve, with which to take readings from a 
spectrophotometer off of. Rather than what I would normally use models 
for - such as hypothesis testing and analysis of data from experiments.

Thanks,
Ben.
On 20/02/2011 11:53, nzcoops wrote:
#
On Feb 20, 2011, at 1:27 PM, Ben Ward wrote:

            
I thought we were leaving behind the model that had the dependent  
variable on both sides of hte equation. Can you explain how you would  
construct a chart or a function that will turn those results into  
something useful?
David Winsemius, MD
West Hartford, CT
#
On 20/02/2011 18:52, David Winsemius wrote:
Sorry for any confusion, I am indeed leaving behind the Y ~ X + Y^2 
model, I was only mentioning that it had a higher R squared than 
anything else I had done.

I'll explain exactly what the chart/function is made from and is for, to 
remove any mystery or highlight anything important: The curve I'm trying 
to produce, is from readings from a spectrophotometer, from series of 
standards called the MacFarland Scale - tubes of solution which were 
made up to simulate different known numbers of bacteria in a liquid 
sample: the standards are of different intensities of coloudy-ness, and 
they represent different numbers of bacteria in a liquid medium. The (X) 
axis is % light transmission as measured by the spectrophotometer, the 
cloudier the standard, the lower this value, and the Bacterial Counts on 
the (Y) axis, are higher, the cloudier a sample (or the standards) are, 
thus it is higher with lower % Light Transmission. This is a negative 
curve, which never goes below 0, because you cant have less than no 
bacteria. This curve/function, which has been made from the standards of 
known numbers, will then be used to estimate the numbers of bacteria 
present in proper samples, rather than known standards. I would take the 
% Light Transmission Readings the spectrophotometer gives me for those 
standards, and then using the curve - or working through the function, 
calculate the (Approximate) number of bacteria present in my samples. I 
took 5, % Light Transmission readings for each standard to make sure the 
machine wasn't giving me wildly different readings (it' wasnt), and used 
them to try and construct a curve.

Hence my comment about  Y ~ X + Y^2 giving me a better fit (R-Squared). 
I need a good fit to make reliable predictions. However, since then, 
I've added an extra standard from the scale (There are about 10/11 
standards to the scale, but I've got 9, because the last few can't be 
made up accurately without introducing error or the lab tech's running 
into difficulty doing it by hand), and the Y ~ X + Y^2 curve is no 
longer sufficient after this.

I followed a suggestion (I think by you) to try a GLM, with the poisson 
distribution, as I mentioned in my last email that went around, which 
wasn't closely fitting as I would like, it's my first experience with 
GLM, but to my understanding, the Residual Deviance, was orders higher 
than the residual degrees of freedom. So under instruction of a 
textbook, I tried with a quasi-poisson, but still got the same result. 
Then (again textbook suggestion) tried with a negative binomial 
distribution, and got a much lower AIC value and a residual deviance 
much much closer to the residual degrees of freedom (the R print outs of 
them are on the previous email I sent around).

I hope this has explained when you asked "Can I explain how I would 
construct a chart or a function that will turn those results into 
something useful?"

Thanks,
Ben.W