Hi, I wonder if anyone could advise me with this:
I've been trying to make a standard curve in R with lm() of some
standards from a spectrophotometer, so as I can express the curve as a
formula, and so obtain values from my treated samples by plugging in
readings into the formula, instead of trying to judge things by eye,
with a curve drawn by hand.
It is a curve and so I used the following formula:
model <- lm(Approximate.Counts~X..Light.Transmission +
I(Approximate.Counts^2), data=Standards)
It gives me a pretty decent graph:
xyplot(Approximate.Counts + fitted(model) ~ X..Light.Transmission,
data=Standards)
I'm pretty happy with it, and looking at the model summary, to my
inexperienced eyes it seems pretty good:
lm(formula = Approximate.Counts ~ X..Light.Transmission +
I(Approximate.Counts^2),
data = Standards)
Residuals:
Min 1Q Median 3Q Max
-91.75 -51.04 27.33 37.28 49.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.868e+02 2.614e+01 37.75 <2e-16 ***
X..Light.Transmission -1.539e+01 8.116e-01 -18.96 <2e-16 ***
I(Approximate.Counts^2) 2.580e-04 6.182e-06 41.73 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.06 on 37 degrees of freedom
Multiple R-squared: 0.9956, Adjusted R-squared: 0.9954
F-statistic: 4190 on 2 and 37 DF, p-value: < 2.2e-16
I tried to put some 95% confidence interval lines on a plot, as advised
by my tutor, to see how they looked, and I used a function I found in
"The R Book":
se.lines <- function(model){
b1<-coef(model)[2]+ summary(model)[[4]][4]
b2<-coef(model)[2]- summary(model)[[4]][4]
xm<-mean(model[[12]][2])
ym<-mean(model[[12]][1])
a1<-ym-b1*xm
a2<-ym-b2*xm
abline(a1,b1,lty=2)
abline(a2,b2,lty=2)
}
se.lines(model)
but when I do this on a plot I get an odd result:
They looks to me, to lie in the same kind of area, that my regression
line did, before I used polynomial regression, by squaring
"Approximate.Counts":
lm(formula = Approximate.Counts ~ X..Light.Transmission +
I(Approximate.Counts^2), data = Standards)
Is there something else I should be doing? I've seen several ways of
dealing with non-linear relationships, from log's of certain variables,
and quadratic regression, and using sin and other mathematical devices.
I'm not completely sure if I'm "allowed" to square the y variable, the
book only squared the x variable in quadratic regression, which I did
first, and it fit quite well, but not as good squaring Approximate
Counts does:
model <- lm(Approximate.Counts~X..Light.Transmission +
I(X..Light.Transmission^2), data=Standards)
Any advice is greatly appreciated, it's the first time I've really had
to look at regression with data in my coursework that isn't a straight line.
Thanks,
Ben Ward.
Confidence Intervals on Standard Curve
6 messages · nzcoops, Ben Ward, David Winsemius
I've just realised the couple of graphs I put on here have been stripped off. If anyone has to see them and can't see my problem from code, I can send them directly to anyone who thinks they can help but wants to see them. Thanks, Ben W.
On 18/02/2011 23:29, Ben Ward wrote:
Hi, I wonder if anyone could advise me with this:
I've been trying to make a standard curve in R with lm() of some
standards from a spectrophotometer, so as I can express the curve as a
formula, and so obtain values from my treated samples by plugging in
readings into the formula, instead of trying to judge things by eye,
with a curve drawn by hand.
It is a curve and so I used the following formula:
model <- lm(Approximate.Counts~X..Light.Transmission +
I(Approximate.Counts^2), data=Standards)
It gives me a pretty decent graph:
xyplot(Approximate.Counts + fitted(model) ~ X..Light.Transmission,
data=Standards)
I'm pretty happy with it, and looking at the model summary, to my
inexperienced eyes it seems pretty good:
lm(formula = Approximate.Counts ~ X..Light.Transmission +
I(Approximate.Counts^2),
data = Standards)
Residuals:
Min 1Q Median 3Q Max
-91.75 -51.04 27.33 37.28 49.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.868e+02 2.614e+01 37.75 <2e-16 ***
X..Light.Transmission -1.539e+01 8.116e-01 -18.96 <2e-16 ***
I(Approximate.Counts^2) 2.580e-04 6.182e-06 41.73 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.06 on 37 degrees of freedom
Multiple R-squared: 0.9956, Adjusted R-squared: 0.9954
F-statistic: 4190 on 2 and 37 DF, p-value: < 2.2e-16
I tried to put some 95% confidence interval lines on a plot, as
advised by my tutor, to see how they looked, and I used a function I
found in "The R Book":
se.lines <- function(model){
b1<-coef(model)[2]+ summary(model)[[4]][4]
b2<-coef(model)[2]- summary(model)[[4]][4]
xm<-mean(model[[12]][2])
ym<-mean(model[[12]][1])
a1<-ym-b1*xm
a2<-ym-b2*xm
abline(a1,b1,lty=2)
abline(a2,b2,lty=2)
}
se.lines(model)
but when I do this on a plot I get an odd result:
They looks to me, to lie in the same kind of area, that my regression
line did, before I used polynomial regression, by squaring
"Approximate.Counts":
lm(formula = Approximate.Counts ~ X..Light.Transmission +
I(Approximate.Counts^2), data = Standards)
Is there something else I should be doing? I've seen several ways of
dealing with non-linear relationships, from log's of certain
variables, and quadratic regression, and using sin and other
mathematical devices. I'm not completely sure if I'm "allowed" to
square the y variable, the book only squared the x variable in
quadratic regression, which I did first, and it fit quite well, but
not as good squaring Approximate Counts does:
model <- lm(Approximate.Counts~X..Light.Transmission +
I(X..Light.Transmission^2), data=Standards)
Any advice is greatly appreciated, it's the first time I've really had
to look at regression with data in my coursework that isn't a straight
line.
Thanks,
Ben Ward.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
model <- lm(Approximate.Counts~X..Light.Transmission + I(Approximate.Counts^2), data=Standards) Might not be addressing the problem, don't you have Y ~ X + Y^2 here? That's a violation of the assumptions of an lm isn't it? Also for plotting CI on a curve look into ggplot2::geom_ribbon, it's much nicer than just plotting lines and is easy to use. had.co.nz should set you right for setting this up.
View this message in context: http://r.789695.n4.nabble.com/Confidence-Intervals-on-Standard-Curve-tp3313850p3315071.html Sent from the R help mailing list archive at Nabble.com.
It is, I tried a glm with a poisson distribution, as was suggested to me
previously, but the Residual Deviance was too high - the book I'm
reading says it suggests overdispersion because it's way above the
Residual degrees of freedom:
glm(formula = Approximate.Counts ~ X..Light.Transmission, family = poisson,
data = Standard)
Deviance Residuals:
Min 1Q Median 3Q Max
-6.434 -2.822 -1.257 4.319 9.432
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.7584256 0.0058981 1315.4 <2e-16 ***
X..Light.Transmission -0.0497474 0.0004951 -100.5 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 15709.90 on 39 degrees of freedom
Residual deviance: 671.91 on 38 degrees of freedom
AIC: 1030.7
Number of Fisher Scoring iterations: 4
It suggested using quasi-poisson but that didn't alter anything, so then
used a negative binomial and gained the result:
glm.nb(formula = Approximate.Counts ~ X..Light.Transmission,
data = Standard, init.theta = 67.14797983, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5281 -0.9208 -0.1883 0.9685 2.0121
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.69797 0.02819 273.11 <2e-16 ***
X..Light.Transmission -0.04410 0.00141 -31.26 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(67.148) family taken to be 1)
Null deviance: 905.35 on 39 degrees of freedom
Residual deviance: 40.39 on 38 degrees of freedom
AIC: 517.21
Number of Fisher Scoring iterations: 1
Theta: 67.1
Std. Err.: 16.1
2 x log-likelihood: -511.21
Which shows a much lower Residual Deviance, and AIC. I plot it and It is
also pretty good. I'm yet to assess it with plot(model). This is my
first proper go with GLM so I'm not confident at evaluating them,
although I am aware that a better model of the data has lower AIC, and
Residual Deviance should not be orders above the degrees of freedom.
However, the
Y ~ X + Y^2
Produces the best fitting line - it is pretty much on the data points -
I'm trying to make a standard curve, with which to take readings from a
spectrophotometer off of. Rather than what I would normally use models
for - such as hypothesis testing and analysis of data from experiments.
Thanks,
Ben.
On 20/02/2011 11:53, nzcoops wrote:
model<- lm(Approximate.Counts~X..Light.Transmission + I(Approximate.Counts^2), data=Standards) Might not be addressing the problem, don't you have Y ~ X + Y^2 here? That's a violation of the assumptions of an lm isn't it? Also for plotting CI on a curve look into ggplot2::geom_ribbon, it's much nicer than just plotting lines and is easy to use. had.co.nz should set you right for setting this up.
On Feb 20, 2011, at 1:27 PM, Ben Ward wrote:
However, the Y ~ X + Y^2 Produces the best fitting line - it is pretty much on the data points - I'm trying to make a standard curve, with which to take readings from a spectrophotometer off of. Rather than what I would normally use models for - such as hypothesis testing and analysis of data from experiments.
I thought we were leaving behind the model that had the dependent variable on both sides of hte equation. Can you explain how you would construct a chart or a function that will turn those results into something useful?
Thanks, Ben. On 20/02/2011 11:53, nzcoops wrote:
model<- lm(Approximate.Counts~X..Light.Transmission + I(Approximate.Counts^2), data=Standards) Might not be addressing the problem, don't you have Y ~ X + Y^2 here? That's a violation of the assumptions of an lm isn't it? Also for plotting CI on a curve look into ggplot2::geom_ribbon, it's much nicer than just plotting lines and is easy to use. had.co.nz should set you right for setting this up.
David Winsemius, MD West Hartford, CT
On 20/02/2011 18:52, David Winsemius wrote:
On Feb 20, 2011, at 1:27 PM, Ben Ward wrote:
However, the Y ~ X + Y^2 Produces the best fitting line - it is pretty much on the data points - I'm trying to make a standard curve, with which to take readings from a spectrophotometer off of. Rather than what I would normally use models for - such as hypothesis testing and analysis of data from experiments.
I thought we were leaving behind the model that had the dependent variable on both sides of hte equation. Can you explain how you would construct a chart or a function that will turn those results into something useful?
Sorry for any confusion, I am indeed leaving behind the Y ~ X + Y^2 model, I was only mentioning that it had a higher R squared than anything else I had done. I'll explain exactly what the chart/function is made from and is for, to remove any mystery or highlight anything important: The curve I'm trying to produce, is from readings from a spectrophotometer, from series of standards called the MacFarland Scale - tubes of solution which were made up to simulate different known numbers of bacteria in a liquid sample: the standards are of different intensities of coloudy-ness, and they represent different numbers of bacteria in a liquid medium. The (X) axis is % light transmission as measured by the spectrophotometer, the cloudier the standard, the lower this value, and the Bacterial Counts on the (Y) axis, are higher, the cloudier a sample (or the standards) are, thus it is higher with lower % Light Transmission. This is a negative curve, which never goes below 0, because you cant have less than no bacteria. This curve/function, which has been made from the standards of known numbers, will then be used to estimate the numbers of bacteria present in proper samples, rather than known standards. I would take the % Light Transmission Readings the spectrophotometer gives me for those standards, and then using the curve - or working through the function, calculate the (Approximate) number of bacteria present in my samples. I took 5, % Light Transmission readings for each standard to make sure the machine wasn't giving me wildly different readings (it' wasnt), and used them to try and construct a curve. Hence my comment about Y ~ X + Y^2 giving me a better fit (R-Squared). I need a good fit to make reliable predictions. However, since then, I've added an extra standard from the scale (There are about 10/11 standards to the scale, but I've got 9, because the last few can't be made up accurately without introducing error or the lab tech's running into difficulty doing it by hand), and the Y ~ X + Y^2 curve is no longer sufficient after this. I followed a suggestion (I think by you) to try a GLM, with the poisson distribution, as I mentioned in my last email that went around, which wasn't closely fitting as I would like, it's my first experience with GLM, but to my understanding, the Residual Deviance, was orders higher than the residual degrees of freedom. So under instruction of a textbook, I tried with a quasi-poisson, but still got the same result. Then (again textbook suggestion) tried with a negative binomial distribution, and got a much lower AIC value and a residual deviance much much closer to the residual degrees of freedom (the R print outs of them are on the previous email I sent around). I hope this has explained when you asked "Can I explain how I would construct a chart or a function that will turn those results into something useful?" Thanks, Ben.W
Thanks, Ben. On 20/02/2011 11:53, nzcoops wrote:
model<- lm(Approximate.Counts~X..Light.Transmission + I(Approximate.Counts^2), data=Standards) Might not be addressing the problem, don't you have Y ~ X + Y^2 here? That's a violation of the assumptions of an lm isn't it? Also for plotting CI on a curve look into ggplot2::geom_ribbon, it's much nicer than just plotting lines and is easy to use. had.co.nz should set you right for setting this up.
David Winsemius, MD West Hartford, CT