Hi
I have measured the UV absorbance (abs) of 10 solutions of a substance at
known concentrations (conc) and have used a linear model to plot a
calibration graph with confidence limits. I now want to predict the
concentration of solutions with UV absorbance results given in the new.abs
data.frame, however predict.lm only appears to work for new "conc" variables
not new "abs" variables.
I have search the help files and did find a similar problem in June 2000,
but unfortunately no solution was offered.
Any help and how to use predict.lm with the new "abs" data to predict "conc"
with confidence limits would be appreciated.
conc<-seq(100, 280, 20) # mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, 1.852, 1.936,
2.046) # absorbance units
lm.calibration<-lm(abs ~ conc)
pred.w.plim <- predict(lm.calibration, interval="prediction")
pred.w.clim <- predict(lm.calibration, interval="confidence")
matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
points(conc, abs, pch=21, col="blue")
new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
predict(calibration.lm, new.abs) # does not work
Thanks
Mike White
Help with predict.lm
7 messages · Mike White, (Ted Harding)
On 19-Apr-05 Mike White wrote:
Hi
I have measured the UV absorbance (abs) of 10 solutions
of a substance at known concentrations (conc) and have
used a linear model to plot a calibration graph with
confidence limits. I now want to predict the concentration
of solutions with UV absorbance results given in the
new.abs data.frame, however predict.lm only appears to work
for new "conc" variables not new "abs" variables.
I have search the help files and did find a similar problem
in June 2000, but unfortunately no solution was offered.
Any help and how to use predict.lm with the new "abs" data
to predict "conc" with confidence limits would be appreciated.
conc<-seq(100, 280, 20) # mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
1.852, 1.936,2.046) # absorbance units
lm.calibration<-lm(abs ~ conc)
pred.w.plim <- predict(lm.calibration, interval="prediction")
pred.w.clim <- predict(lm.calibration, interval="confidence")
matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
points(conc, abs, pch=21, col="blue")
new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
predict(calibration.lm, new.abs) # does not work
Apart from the apparent typo "calibration.lm" which Matthias pointed out, this will not work anyway since "predict" predicts in the same direction as the model was fitted in the first place. You have fitted "abs ~ conc", so the lm object lm.calibration contains a representation of the model equivalent to abs = a + b*conc When you use predict(calibration.lm, new.abs) it will expect the dataframe "new.abs" to contain values in a list named "conc". Since you have supplied a list named "abs", this is apparently ignored, and as a result the function uses the default dataframe which is the data encapsulated in the calibration.lm object. You can verify this by comparing the outputs of predict(lm.calibration, new.abs) and predict(lm.calibration) You will see that they are identical! Either way, "predict" predicts "ans" from "conc" by using the above equation. It does not attempt to invert the equation even if you call the "new" data "abs". Given the apparently extremely close fit, in your data, to a straight line, you are likely to get perfectly adequate results simply by doing the regression the other way round, i.e.
lm.calibration2<-lm(conc ~ abs) new.abs<-data.frame(abs=c(1.251, 1.324, 1.452)) predict(lm.calibration2,new.abs)
1 2 3 131.3813 144.7453 168.1782 which looks about right! Strictly speaking, this approach is somewhat invalid, since under the model abs = a + b*conc + error the "inverse regression" conc = (abs - a)/b = A + B*abs does not have the standard statistical properties because of the theoretical possiblity (P > 0) that the estimated b can be arbitrarily close to 0 (with the result that, theoretically, the estimate of B = 1/b has neither expectation nor variance), and the estimates of A and B could be a long way out. Also, the confidence intervals you would get by using the residuals from this "inverse regression" in the usual way would not be valid, since they would be finite for all values of the confidence level. In reality, for a sufficiently high confidence level (but still short of 100%) you will get a confidence "interval" consisting of several parts with gaps between them, or an infinite confidence interval. This takes place at confidence levels corresponding to the significance level at which you can no longer reject the hypothesis "b = 0" in the first euqation. In your case this significance level would be extremely small (I get 2.71e-12, so you can get confidence intervals up to at least 99.999999999% confidence level before you need to worry much about the confidence interval! In your case where there seems to be a very tight linear relationship, the possible misrepresentation of the inverse relationship due to this effect is very unlikely to have occurred. The correct approach has in the past been the subject of at times quite controversial discussion, under the title indeed of "The Calibration Problem". Nowadays this problem would be approached by making the concentrations to be "predicted" additional unknown parameters, and evaluating likelihood ratios for possible values of these. I don't have time at the moment to go into this approach, but will try to write something later. It seems there is nothing in R at present for this kind of general (and basically simple) use, though maybe there is something usable in the package mscalib (which I have not studied); however, by the look of the contents, the routines therein may be too elaborately specialised towards a specific type of application to be convenient for general use in conjunction with lm. Till later, Ted. it might be worth spelling it out for people who would like to implement it themselves. Later. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 19-Apr-05 Time: 15:20:26 ------------------------------ XFMail ------------------------------
Ted I agree that with the example data the "inverse regression" approach would be adequate, but it would be useful to have a function to predict the results and confidence intervals correctly. I look forward to your solution to the calibration problem. In the mean time, I have looked at the calib function referred to by Andy Liaw (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/1202.html) but the formula for the confidence intervals appears to be incorrect (at least according to "Quality Assurance in Analytical Chemistry, W.Funk, V. Dammann and G.Donnevert). The brackets multipled by term1 should have been to the power 0.5 (square root) not 2 (squared). Also the function needs be multiplied by the t value for the appropriate probability and degrees of freedom. May corrected code is shown below, any suggestions for calculating t? # R function to do classical calibration # input the x and y vectors and the value of y to make an # estimate for # val is a value of dep # returns the calibrated value of x and it's # single confidence interval about x # form of calibration particularly suited to jackknifing calib <- function(indep, dep, val) { # generate the model y on x and get out predicted values for x reg <- lm(dep~indep) xpre <- (val - coef(reg)[1])/coef(reg)[2] # generate a confidence yyat <- ((dep - predict(reg))^2) sigyyat <- ((sum(yyat)/(length(dep)-2))^0.5) term1 <- sigyyat/coef(reg)[2] sigxxbar <- sum((indep - mean(indep))^2) denom <- sigxxbar * ((coef(reg)[2])^2) t<- ## needs function to assign t value from t-table conf <- t*abs(((1+(1/length(dep))+(((val - mean(dep))^2)/denom))^0.5)*term1) ## squared term changed to square root results<-list(Predicted=xpre, Confidence=conf) ## returns results as list to avoid warning message return(results) } Thanks Mike White ----- Original Message ----- From: "Ted Harding" <Ted.Harding at nessie.mcc.ac.uk> To: "Mike White" <mikewhite.diu at tiscali.co.uk> Cc: <R-help at stat.math.ethz.ch> Sent: Tuesday, April 19, 2005 3:20 PM Subject: RE: [R] Help with predict.lm
On 19-Apr-05 Mike White wrote:
Hi
I have measured the UV absorbance (abs) of 10 solutions
of a substance at known concentrations (conc) and have
used a linear model to plot a calibration graph with
confidence limits. I now want to predict the concentration
of solutions with UV absorbance results given in the
new.abs data.frame, however predict.lm only appears to work
for new "conc" variables not new "abs" variables.
I have search the help files and did find a similar problem
in June 2000, but unfortunately no solution was offered.
Any help and how to use predict.lm with the new "abs" data
to predict "conc" with confidence limits would be appreciated.
conc<-seq(100, 280, 20) # mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
1.852, 1.936,2.046) # absorbance units
lm.calibration<-lm(abs ~ conc)
pred.w.plim <- predict(lm.calibration, interval="prediction")
pred.w.clim <- predict(lm.calibration, interval="confidence")
matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
points(conc, abs, pch=21, col="blue")
new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
predict(calibration.lm, new.abs) # does not work
Apart from the apparent typo "calibration.lm" which Matthias pointed out, this will not work anyway since "predict" predicts in the same direction as the model was fitted in the first place. You have fitted "abs ~ conc", so the lm object lm.calibration contains a representation of the model equivalent to abs = a + b*conc When you use predict(calibration.lm, new.abs) it will expect the dataframe "new.abs" to contain values in a list named "conc". Since you have supplied a list named "abs", this is apparently ignored, and as a result the function uses the default dataframe which is the data encapsulated in the calibration.lm object. You can verify this by comparing the outputs of predict(lm.calibration, new.abs) and predict(lm.calibration) You will see that they are identical! Either way, "predict" predicts "ans" from "conc" by using the above equation. It does not attempt to invert the equation even if you call the "new" data "abs". Given the apparently extremely close fit, in your data, to a straight line, you are likely to get perfectly adequate results simply by doing the regression the other way round, i.e.
lm.calibration2<-lm(conc ~ abs) new.abs<-data.frame(abs=c(1.251, 1.324, 1.452)) predict(lm.calibration2,new.abs)
1 2 3 131.3813 144.7453 168.1782 which looks about right! Strictly speaking, this approach is somewhat invalid, since under the model abs = a + b*conc + error the "inverse regression" conc = (abs - a)/b = A + B*abs does not have the standard statistical properties because of the theoretical possiblity (P > 0) that the estimated b can be arbitrarily close to 0 (with the result that, theoretically, the estimate of B = 1/b has neither expectation nor variance), and the estimates of A and B could be a long way out. Also, the confidence intervals you would get by using the residuals from this "inverse regression" in the usual way would not be valid, since they would be finite for all values of the confidence level. In reality, for a sufficiently high confidence level (but still short of 100%) you will get a confidence "interval" consisting of several parts with gaps between them, or an infinite confidence interval. This takes place at confidence levels corresponding to the significance level at which you can no longer reject the hypothesis "b = 0" in the first euqation. In your case this significance level would be extremely small (I get 2.71e-12, so you can get confidence intervals up to at least 99.999999999% confidence level before you need to worry much about the confidence interval! In your case where there seems to be a very tight linear relationship, the possible misrepresentation of the inverse relationship due to this effect is very unlikely to have occurred. The correct approach has in the past been the subject of at times quite controversial discussion, under the title indeed of "The Calibration Problem". Nowadays this problem would be approached by making the concentrations to be "predicted" additional unknown parameters, and evaluating likelihood ratios for possible values of these. I don't have time at the moment to go into this approach, but will try to write something later. It seems there is nothing in R at present for this kind of general (and basically simple) use, though maybe there is something usable in the package mscalib (which I have not studied); however, by the look of the contents, the routines therein may be too elaborately specialised towards a specific type of application to be convenient for general use in conjunction with lm. Till later, Ted. it might be worth spelling it out for people who would like to implement it themselves. Later. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 19-Apr-05 Time: 15:20:26 ------------------------------ XFMail ------------------------------
On 19-Apr-05 Ted Harding wrote:
On 19-Apr-05 Mike White wrote:
Hi
I have measured the UV absorbance (abs) of 10 solutions
of a substance at known concentrations (conc) and have
used a linear model to plot a calibration graph with
confidence limits. I now want to predict the concentration
of solutions with UV absorbance results given in the
new.abs data.frame, however predict.lm only appears to work
for new "conc" variables not new "abs" variables.
[...]
conc<-seq(100, 280, 20) # mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
1.852, 1.936,2.046) # absorbance units
lm.calibration<-lm(abs ~ conc)
pred.w.plim <- predict(lm.calibration, interval="prediction")
pred.w.clim <- predict(lm.calibration, interval="confidence")
matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
points(conc, abs, pch=21, col="blue")
new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
[...]
The correct approach has in the past been the subject of at times quite controversial discussion, under the title indeed of "The Calibration Problem". Nowadays this problem would be approached by making the concentrations to be "predicted" additional unknown parameters, and evaluating likelihood ratios for possible values of these. I don't have time at the moment to go into this approach, but will try to write something later.
Here is the basic theory, and an associated method, for
the case of estimating 1 value of X ("conc") corresponding
to 1 given value of Y ("abs"), when data x1,...,xn and
y1,...yn have been obtained (your data "conc" and "abs"
above, respectively).
The model
y = a + b*x + e
is assumed, where the x-values are given (as typically in
a calibration experiment -- e.g. measuring standard test
samples of known y-value), and the y-values are measured
according to the above, with errors e assumed distributed
as N(0,s^2).
For simplicity, centre the x and y observations at their
means, so that Sum[xi]=0 and Sum[yi]=0.
Let an observation Y of y be given.
To infer the corresponding value of X.
(X is measured from the mean of the xi, Y from the mean of the yi).
Let X play the role of an additional parameter in the
problem. Then the likelihood function for (a,b,s,X) is
(1/(s^(n+1))*exp(-(Sum[(y-a-b*x)^2] + (Y-a-b*X)^2)/s^2)
The MLEs of a#, b#, s#, X# of a, b, s, X are then given by
a# = 0
b# = (Sum[xy])/(Sum[x^2])
X# = Y/b#
(s#)^2 = (1/(n+1))Sum[(y - (b#)*x)^2
Now suppose that X is set at a fixed value (again denoted by X).
Then the MLEs a~, b~, s~ of a, b and s are now given by
a~ = (Y*Sum[x^2] - X*Sum[x*y])/D
b~ = ((n+1)*Sum[x*y] + n*X*Y)/D
(s~)^2 = (Sum[(y - a~ - (b~)*x)^2] + (Y - a~ - (b~)*X)^2)/(n+1)
where
D = (n+1)*Sum[x^2] + n*X^2
The likelihood ratio (profile likelihood for X) of the hypothesis
that X has a fixed value (X) versus the hypothesis that X might
be anything is therefore
((s#)/(s~))^(n+1)
which depends only on
R(X) = (s#)^2/(s~)^2
where s~ (but not s#) depends on X.
Now
(n-1)*((s~)^2 - (s#)^2)/(s#^2) = (n-2)*(1 - R(X))/R(X)
has the F distribution with 1 and (n-1) d.f., quite independent
of the true values of a, b and s^2, and large values of R(X)
correspond to small values of this "F ratio".
Hence the MLE of X is Y/B# and a confidence set for X at
a given confidence level P0 is the set of all X such that
(n-2)*(1 - R(X))/R(X) < qf(1-P0,1,n-2)
(in the notation of R's F functions pf, qf, rf, etc.)
The following function implements the above expressions.
It is a very crude approach to solving the problem, and
I'm sure that a more thoughtful approach would lead more
directly to the answer, but at least it gets you there
eventually.
===========================================
R.calib <- function(x,y,X,Y){
n<-length(x) ; mx<-mean(x) ; my<-mean(y) ;
x<-(x-mx) ; y<-(y-my) ; X<-(X-mx) ; Y<-(Y-my)
ah<-0 ; bh<-(sum(x*y))/(sum(x^2)) ; Xh <- Y/bh
sh2 <- (sum((y-ah-bh*x)^2))/(n+1)
D<-(n+1)*sum(x^2) + n*X^2
at<-(Y*sum(x^2) - X*sum(x*y))/D; bt<-((n+1)*sum(x*y) + n*X*Y)/D
st2<-(sum((y - at - bt*x)^2) + (Y - at - bt*X)^2)/(n+1)
R<-(sh2/st2)
F<-(n-2)*(1-R)/R
x<-(x+mx) ; y<-(y+my) ;
X<-(X+mx) ; Y<-(Y+my) ; Xh<-(Xh+mx) ;
PF<-(pf(F,1,(n-2)))
list(x=x,y=y,X=X,Y=Y,R=R,F=F,PF=PF,
ahat=ah,bhat=bh,sh2=sh2,
atil=at,btil=bt,st2=st2,
Xhat=Xh)
}
============================================
Now lets take your original data and the first Y-value
in your list (namely Y = 1.251), and suppose you want
a 95% confidence interval for X. The X-value corresponding
to Y which you would get by regressing x (conc) on y (abs)
is X = 131.3813 so use this as a "starting value".
So now run this function with x<-conc, y<-abs, and these values
of X and Y:
R.calib(x,y,131.3813,1.251)
You get a long list of stuff, amongst which
$PF
[1] 0.02711878
and
$Xhat
[1] 131.2771
So now you know that Xhat (the MLE of X for that Y) = 131.2771
and the F-ratio probability is 0.027...
You want to push $PF upwards till it reaches 0.05, so work
*outwards* in the X-value:
R.calib(x,y,131.4000,1.251)$PF
[1] 0.03198323
R.calib(x,y,131.4500,1.251)$PF
[1] 0.0449835
...
R.calib(x,y,131.4693,1.251)$PF
[1] 0.04999851
and you're there in that direction. Now go back to the MLE
and work out in the other direction:
R.calib(x,y,131.2771,1.251)$PF
[1] 1.987305e-06
R.calib(x,y,131.2000,1.251)$PF
[1] 0.02005908
R.calib(x,y,131.1000,1.251)$PF
[1] 0.04604698
...
R.calib(x,y,131.0847,1.251)$PF
[1] 0.0500181
and again you're there.
So now you have the MLE Xhat = 131.2771, and the two
limits of a 95% confidence interval (131.0847, 131.4693)
for X, corresponding to the given value 1.251 of Y.
I leave it to you (or any other interested parties)
to wrap this basic method in a proper routine which
will avoid the above groping (but at least it shows
explicitly what's going on).
As a refinement: In you original query, you put up
three values of Y: 1.251, 1.324, 1.452
Now you could, of course, simply apply the above
method to each one separately as above.
However, the same theoretical approach can be used
to obtain a joint confidence region for the three
corresponding X values jointly, and this is theoretically
more correct, since these are 3 extra parameters and
they will have a covariance structure. You would need
to work through the algebra for k such Y-values and the
corresponding X-values.
In practice this may not matter much -- probably not
at all for your data.
Note: The above analysis of the single-case situation
was exhibited in an address I gave to a modelling symposium
in 1985, subsequently published along with the other
presentations in a special number of The Statistician:
Modelling: the classical approach
The Statistician (1986) vol 35, pp. 115-134
NB that the equation for alpha-tilde (corresponding to a~ above)
in that article has a misprint ("Y" missing before Sum(x^2)).
This analysis had something in common with work by Phil Brown:
P.J. Brown (1982) Multivariate Calibration
JRSS Ser. B, vol 44, pp. 287-321
though Brown's approach did not start from the likelihood
ratio principle.
Hoping this helps!
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Apr-05 Time: 22:27:01
------------------------------ XFMail ------------------------------
Sorry, I was doing this too late last night! All stands as before except for the calculation at the end which is now corrected as follows:
On 19-Apr-05 Ted Harding wrote:
[code repeated for reference]
The following function implements the above expressions.
It is a very crude approach to solving the problem, and
I'm sure that a more thoughtful approach would lead more
directly to the answer, but at least it gets you there
eventually.
===========================================
R.calib <- function(x,y,X,Y){
n<-length(x) ; mx<-mean(x) ; my<-mean(y) ;
x<-(x-mx) ; y<-(y-my) ; X<-(X-mx) ; Y<-(Y-my)
ah<-0 ; bh<-(sum(x*y))/(sum(x^2)) ; Xh <- Y/bh
sh2 <- (sum((y-ah-bh*x)^2))/(n+1)
D<-(n+1)*sum(x^2) + n*X^2
at<-(Y*sum(x^2) - X*sum(x*y))/D; bt<-((n+1)*sum(x*y) + n*X*Y)/D
st2<-(sum((y - at - bt*x)^2) + (Y - at - bt*X)^2)/(n+1)
R<-(sh2/st2)
F<-(n-2)*(1-R)/R
x<-(x+mx) ; y<-(y+my) ;
X<-(X+mx) ; Y<-(Y+my) ; Xh<-(Xh+mx) ;
PF<-(pf(F,1,(n-2)))
list(x=x,y=y,X=X,Y=Y,R=R,F=F,PF=PF,
ahat=ah,bhat=bh,sh2=sh2,
atil=at,btil=bt,st2=st2,
Xhat=Xh)
}
============================================
Now lets take your original data and the first Y-value
in your list (namely Y = 1.251), and suppose you want
a 95% confidence interval for X. The X-value corresponding
to Y which you would get by regressing x (conc) on y (abs)
is X = 131.3813 so use this as a "starting value".
So now run this function with x<-conc, y<-abs, and these values
of X and Y:
R.calib(x,y,131.3813,1.251)
You get a long list of stuff, amongst which
$PF
[1] 0.02711878
and
$Xhat
[1] 131.2771
So now you know that Xhat (the MLE of X for that Y) = 131.2771
and the F-ratio probability is 0.027...
*****> You want to push $PF upwards till it reaches 0.05, so work *****> *outwards* in the X-value: WRONG!! Till it reaches ***0.95*** R.calib(x,y,125.0000,1.251)$PF [1] 0.9301972 ... R.calib(x,y,124.3510,1.251)$PF [1] 0.949989
and you're there in that direction. Now go back to the MLE and work out in the other direction: R.calib(x,y,131.2771,1.251)$PF [1] 1.987305e-06
... R.calib(x,y,138.0647,1.251)$PF [1] 0.95
and again you're there. So now you have the MLE Xhat = 131.2771, and the two
****> limits of a 95% confidence interval (131.0847, 131.4693) WRONG!!! limits of a confidence interval (124.3510, 138.0647)
for X, corresponding to the given value 1.251 of Y.
As it happens, these seem to correspond very closely to what you would get by reading "predict" in reverse:
plot(x,y) plm<-lm(y~x) min(x)
[1] 100
max(x)
[1] 280
points((131.2771),(1.251),col="red",pch="+") #The MLE of X lines(c(124.3506,138.0647),c(1.251,1.251),col="red") # The above CI newx<-data.frame(x=(100:280)) predlm<-predict(plm,newdata=newx,interval="prediction") lines(newx$x,predlm[,"fit"],col="green") lines(newx$x,predlm[,"lwr"],col="blue") lines(newx$x,predlm[,"upr"],col="blue")
which is what I thought would happen in the first place, given the quality of your data. Sorry for any inconvenience due to the above error. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 20-Apr-05 Time: 08:58:37 ------------------------------ XFMail ------------------------------
Ted
Thank you for giving me so much of your time to provide an explanation of
the likelihood ratio approach to the calibration problem. It has certainly
provided me with a new insight into this problem. I will check out the
references you mentioned to get a better understanding of the statistics.
Out of interest I have also tried the function provided by David Lucy back
in March 2002 with a correction to the code and addition of the t-value.
The results are very close to those obtained by the likelihood ratio method,
although the output needs tidying up.
####################
# R function to do classical calibration
# val is a data.frame of the new indep variables to predict dep
calib <- function(indep, dep, val, prob=0.95)
{
# generate the model y on x and get out predicted values for x
reg <- lm(dep~indep)
xpre <- (val - coef(reg)[1])/coef(reg)[2]
# generate a confidence
yyat <- ((dep - predict(reg))^2)
sigyyat <- ((sum(yyat)/(length(dep)-2))^0.5)
term1 <- sigyyat/coef(reg)[2]
sigxxbar <- sum((indep - mean(indep))^2)
denom <- sigxxbar * ((coef(reg)[2])^2)
t<-qt(p=(prob+(1-prob)/2), df=(length(dep)-2))
conf <- t*abs((((1+1/length(dep))+(((val -
mean(dep))^2)/denom))^0.5)*term1)
results<-list(Predicted=xpre, Conf.interval=c(xpre-conf, xpre+conf))
return(results)
}
conc<-seq(100, 280, 20) # mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, 1.852, 1.936, 2.046)
# absorbance units
new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
calib(conc, abs, new.abs, prob=0.95)
$Predicted
abs
1 131.2771
2 144.6649
3 168.1394
$Conf.interval
$Conf.interval$abs
[1] 124.4244 137.9334 161.5477
$Conf.interval$abs
[1] 138.1298 151.3964 174.7310
Thanks
Mike
----- Original Message -----
From: "Ted Harding" <Ted.Harding at nessie.mcc.ac.uk>
To: "Mike White" <mikewhite.diu at tiscali.co.uk>
Cc: <R-help at stat.math.ethz.ch>
Sent: Wednesday, April 20, 2005 8:58 AM
Subject: RE: [R] Help with predict.lm
Sorry, I was doing this too late last night! All stands as before except for the calculation at the end which is now corrected as follows: On 19-Apr-05 Ted Harding wrote: [code repeated for reference]
The following function implements the above expressions.
It is a very crude approach to solving the problem, and
I'm sure that a more thoughtful approach would lead more
directly to the answer, but at least it gets you there
eventually.
===========================================
R.calib <- function(x,y,X,Y){
n<-length(x) ; mx<-mean(x) ; my<-mean(y) ;
x<-(x-mx) ; y<-(y-my) ; X<-(X-mx) ; Y<-(Y-my)
ah<-0 ; bh<-(sum(x*y))/(sum(x^2)) ; Xh <- Y/bh
sh2 <- (sum((y-ah-bh*x)^2))/(n+1)
D<-(n+1)*sum(x^2) + n*X^2
at<-(Y*sum(x^2) - X*sum(x*y))/D; bt<-((n+1)*sum(x*y) + n*X*Y)/D
st2<-(sum((y - at - bt*x)^2) + (Y - at - bt*X)^2)/(n+1)
R<-(sh2/st2)
F<-(n-2)*(1-R)/R
x<-(x+mx) ; y<-(y+my) ;
X<-(X+mx) ; Y<-(Y+my) ; Xh<-(Xh+mx) ;
PF<-(pf(F,1,(n-2)))
list(x=x,y=y,X=X,Y=Y,R=R,F=F,PF=PF,
ahat=ah,bhat=bh,sh2=sh2,
atil=at,btil=bt,st2=st2,
Xhat=Xh)
}
============================================
Now lets take your original data and the first Y-value
in your list (namely Y = 1.251), and suppose you want
a 95% confidence interval for X. The X-value corresponding
to Y which you would get by regressing x (conc) on y (abs)
is X = 131.3813 so use this as a "starting value".
So now run this function with x<-conc, y<-abs, and these values
of X and Y:
R.calib(x,y,131.3813,1.251)
You get a long list of stuff, amongst which
$PF
[1] 0.02711878
and
$Xhat
[1] 131.2771
So now you know that Xhat (the MLE of X for that Y) = 131.2771
and the F-ratio probability is 0.027...
*****> You want to push $PF upwards till it reaches 0.05, so work *****> *outwards* in the X-value: WRONG!! Till it reaches ***0.95*** R.calib(x,y,125.0000,1.251)$PF [1] 0.9301972 ... R.calib(x,y,124.3510,1.251)$PF [1] 0.949989
and you're there in that direction. Now go back to the MLE and work out in the other direction: R.calib(x,y,131.2771,1.251)$PF [1] 1.987305e-06
... R.calib(x,y,138.0647,1.251)$PF [1] 0.95
and again you're there. So now you have the MLE Xhat = 131.2771, and the two
****> limits of a 95% confidence interval (131.0847, 131.4693) WRONG!!! limits of a confidence interval (124.3510, 138.0647)
for X, corresponding to the given value 1.251 of Y.
As it happens, these seem to correspond very closely to what you would get by reading "predict" in reverse:
plot(x,y) plm<-lm(y~x) min(x)
[1] 100
max(x)
[1] 280
points((131.2771),(1.251),col="red",pch="+") #The MLE of X lines(c(124.3506,138.0647),c(1.251,1.251),col="red") # The above CI newx<-data.frame(x=(100:280)) predlm<-predict(plm,newdata=newx,interval="prediction") lines(newx$x,predlm[,"fit"],col="green") lines(newx$x,predlm[,"lwr"],col="blue") lines(newx$x,predlm[,"upr"],col="blue")
which is what I thought would happen in the first place, given the quality of your data. Sorry for any inconvenience due to the above error. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 20-Apr-05 Time: 08:58:37 ------------------------------ XFMail ------------------------------
On 20-Apr-05 Mike White wrote:
Ted Thank you for giving me so much of your time to provide an explanation of the likelihood ratio approach to the calibration problem. It has certainly provided me with a new insight into this problem. I will check out the references you mentioned to get a better understanding of the statistics.
Hi Mike. You should also look at Philip J. Brown & Rolf Sundberg Confidence and Conflict in Multivariate Calibration JRSS B, vol. 49 (1987), pp. 46-57 which greatly extends the profile likelihood/likelihood based confidence interval approach, relates it also to to Bayesian approaches, and contrasts both with other methods. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 20-Apr-05 Time: 12:53:29 ------------------------------ XFMail ------------------------------