Skip to content

inverse prediction and Poisson regression

13 messages · Spencer Graves, Peter Dalgaard, Brian Ripley +1 more

#
Hello to all, I'm a biologist trying to tackle a "fish" (Poisson Regression) which is just too big for my modest understanding of stats!!!

Here goes...

I want to find good literature or proper mathematical procedure to calculate a confidence interval for an inverse prediction of a Poisson regression using R. 

I'm currently trying to analyse a "dose-response" experiment. 

I want to calculate the dose (X) for 50% inhibition of a biological response (Y). My "response" is a "count" data that fits a Poisson distribution perfectly. 

I could make my life easy and calculate: "dose response/control response" = % of total response... and then use logistic regression, but somehow, that doesn't sound right.
 
Should I just stick to logistic regression and go on with my life? Can I be cured of this paranoia?
;-)

I thought a Poisson regression would be more appropriate, but I don't know how to "properly" calculate the dose equivalent to 50% inhibition. i/e confidence intervals, etc on the "X" = dose. Basically an "inverse" prediction problem.

By the way, my data is "graphically" linear for Log(Y) = log(X) where Y is counts and X is dose.

I use a Poisson regression to fit my dose-response experiment by EXCLUDING the response for dose = 0, because of log(0)

Under "R" =
(that's why you see the "dose[-1]" term. The "first" dose in the dose vector is 0. 

This is really a nice fit. I can obtain a nice slope (B) and intercept (A):

log(Y) = B log(x) + A

I do have a biological value for dose = 0 from my "control". i/e Ymax = some number with a Poisson error again

So, what I want is EC50x :

Y/Ymax = 0.5 = exp(B log(EC50x) + A) / Ymax

exp((log(0.5) + Log(Ymax)) - A)/B) = EC50x

That's all fine, except I don't have a clue on how to calculate the confidence intervals of EC50x or even if I can model this inverse prediction with a Poisson regression. In OLS linear regression, fitting X based on Y is not a good idea because of the way OLS calculates the slope and intercept. Is the same problem found in GLM/Poisson regression? Moreover, I also have a Poisson error on Ymax that I would have to consider, right?

Help!!!!
#
1.  If you provide a toy data set with, e.g., 5 observations, to 
accompany your example, it would be much easier for people to try out 
ideas and then give you a more solid response.

	  2.  Have you tried something like log(dose+0.5) or I(log(dose+0.5)) 
in your model statement in conjunction with "predict" or "predict.glm" 
on the output from "glm"?

hope this helps.  spencer graves
Vincent Philion wrote:
(Poisson Regression) which is just too big for my modest
understanding of stats!!!
procedure to calculate a confidence interval for an
inverse prediction of a Poisson regression using R.
experiment.
of a biological response (Y). My "response" is a "count"
data that fits a Poisson distribution perfectly.
response/control response" = % of total response...
and then use logistic regression, but somehow, that
doesn't sound right.
on with my life? Can I be cured of this paranoia?
appropriate, but I don't know how to "properly"
calculate the dose equivalent to 50% inhibition.
i/e confidence intervals, etc on the "X" = dose.
Basically an "inverse" prediction problem.
Log(Y) = log(X) where Y is counts and X is dose.
experiment by EXCLUDING the response for dose = 0,
because of log(0)
"first" dose in the dose vector is 0.
slope (B) and intercept (A):
my "control". i/e Ymax = some number with a Poisson
error again
to calculate the confidence intervals of EC50x or even
if I can model this inverse prediction with a Poisson
regression. In OLS linear regression, fitting X based
on Y is not a good idea because of the way OLS calculates
the slope and intercept. Is the same problem found in
GLM/Poisson regression? Moreover, I also have a Poisson
error on Ymax that I would have to consider, right?
#
Hello and thank you for your interest in this problem. 

"real life data" would look like this:

x	y
0		28
0.03		21
0.1		11
0.3		15
1		5
3		4
10		1
30		0
100		0

x	y
0	30
0.0025	30
0.02	25
0.16	25
1.28	10
10.24	0
81.92	0

X	Y
0	35
0.00025	23
0.002	14
0.016	6
0.128	5
1.024	3
8.192	2 

X	Y
0  43
0.00025  35
0.002  20
0.016  16
0.128  11
1.024  6
8.192   0 

Where X is dose and Y is response. 
the relation is linear for log(response) = b log(dose) + intercept

Response for dose 0 is a "control" = Ymax. So, What I want is the dose for 50% response. For instance, in example 1:

Ymax = 28 (this is also an observation with Poisson error)

So I want dose for response = 14 = approx. 0.3

any help would be greatly appreciated!

bye for now,

Vincent

  
    
#
On Fri, 25 Jul 2003, Vincent Philion wrote:

            
Is that log(*mean* response), that is a log link and exponential decay 
with dose?
Once you observe Ymax, Y is no longer Poisson.
What exactly is Ymax?  Is it the response at dose 0?  The mean response at
dose 0?  The largest response?  About the only thing I can actually
interpret is that you want to fit a curve of mean response vs dose, and
find the dose at which the mean response is half of that at dose 0.
That one is easy.

I think you are confusing response with mean response, and we can't 
disentangle them for you.
#
Dear Prof. Ripley & M. Philion:

First some commentary then questions for Prof. Ripley and M. Philion.

COMMENTARY

	  Prof. Ripley said, "to fit a curve of mean response vs dose,
and find the dose at which the mean response is half of that at dose 0.
That one is easy."  Unfortunately, it is not obvious to me at the 
moment.  From "www.r-project.org" -> search -> "R site search" -> 
"LD50", I found "dose.p", described on p. 193, sec. 7.2, of Venables and 
Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer).

	  Then I cut the data set down to a size that I could easily play with, 
and fit Poisson regression:

Phytopath <- data.frame(x=c(0, 0.03, 0.1), 		
	y=c(28, 21, 11))
(fitP100 <- glm(y~log(x+0.015), data=Phytopath[rep(1:3, 100),],
   family="poisson"))

Call:  glm(formula = y ~ log(x + 0.015), family = "poisson", data = 
Phytopath[rep(1:3,      100), ])

Coefficients:
    (Intercept)  log(x + 0.015)
         1.6088         -0.4203

(LD50P100 <- dose.p(fitP100, p=14))
              Dose         SE
p = 14: -2.451018 0.04858572

	  To get a 95% confidence interval from this, in S-Plus 6.1, I did:

   exp(LD50 + c(-2,0, 2)*LD50 at SE)-0.015
[1] 0.01762321 0.07120579 0.21279601

	  R 1.7.1 seemed to require a different syntax, which I couldn't parse 
in my present insomniac state (3:20 AM in California).

QUESTIONS:

PROF. RIPLEY:  Is this what you said was easy?

M. PHILION:  Does this provide sufficient information for you to now 
solve your problem?

hope this helps.  spencer graves
Prof Brian Ripley wrote:
> On Fri, 25 Jul 2003, Vincent Philion wrote:
>
 >
 >>Hello and thank you for your interest in this problem.
 >>
 >>"real life data" would look like this:
 >>
 >>x	y
 >>0		28
 >>0.03		21
 >>0.1		11
 >>0.3		15
 >>1		5
 >>3		4
 >>10		1
 >>30		0
 >>100		0
 >>
 >>x	y
 >>0	30
 >>0.0025	30
 >>0.02	25
 >>0.16	25
 >>1.28	10
 >>10.24	0
 >>81.92	0
 >>
 >>X	Y
 >>0	35
 >>0.00025	23
 >>0.002	14
 >>0.016	6
 >>0.128	5
 >>1.024	3
 >>8.192	2
 >>
 >>X	Y
 >>0  43
 >>0.00025  35
 >>0.002  20
 >>0.016  16
 >>0.128  11
 >>1.024  6
 >>8.192   0
 >>
 >>Where X is dose and Y is response.
 >>the relation is linear for log(response) = b log(dose) + intercept
 >
 >
 > Is that log(*mean* response), that is a log link and exponential decay
 > with dose?
 >
 >
 >>Response for dose 0 is a "control" = Ymax. So, What I want is the dose
 >>for 50% response. For instance, in example 1:
 >>
 >>Ymax = 28 (this is also an observation with Poisson error)
 >
 >
 > Once you observe Ymax, Y is no longer Poisson.
 >
 >
 >>So I want dose for response = 14 = approx. 0.3
 >
 >
 > What exactly is Ymax?  Is it the response at dose 0?  The mean 
response at
 > dose 0?  The largest response?  About the only thing I can actually
 > interpret is that you want to fit a curve of mean response vs dose, and
 > find the dose at which the mean response is half of that at dose 0.
 > That one is easy.
 >
 > I think you are confusing response with mean response, and we can't
 > disentangle them for you.
 >
#
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
....
I don't feel all that confused. Y is Poisson distributed with some
mean depending on x. Ymax is a value at X=0, i.e. Poisson distr.
with a mean as large as it can be. 

I think the main confusion here is trying to fit a functional
relationship which doesn't extend to X=0. If you extrapolate a
log-loglinear relation back to X=0, you get an infinite maximal
response if b is negative, so this is going to be inconsistent with a
finite Ymax. In some of the data sets I believe you actually do see a
leveling off for very small doses.

If you insist on this peculiar model, you'd end up with estimating the
mean of Ymax by its observed value. Then you can get b and the
intercept from the observations with X>0, and find your estimate of
halving dose by solving

 log(Ymax/2) = b * log(dose50) + intercept

i.e. dose50 = (log(Ymax/2)-intercept)/b. That's a nonlinear function
of the estimates, so you'd need (at least) to employ the Delta method
to find the approximate variance of the estimate.

However, I'd suggest that you should look for a more realistic
functional form of the relation, e.g. a logistic curve in log(x) or a
Michalis-Menten style inhibition (mean(Y) = ymax/(1+dose/dose50) or
variants thereof). These models are not (necessarily) GLMs, but I
think you can fit them quite well with gnls() and a suitable variance
function.

[In fact the ymax/(1+dose/dose50) model is a GLM if you use an inverse
link and reparametrize with 1/ymax, 1/(ymax*dose50), but inverse links
are not built-in for the poisson() family, so you'd have to modify the
code yourself.]
#
Hello sir, answers follow...

... Where X is dose and Y is response. the relation is linear for log(response) 
 = b log(dose) + intercept
 
 *** Is that log(*mean* response), that is a log link and exponential decay with 
 dose?
 
I'm not sure I understand what you mean by "mean", (no pun intended!) but Y is a biologicial "growth". Only one "observation" for each X. But this observation is from the growth contribution of about 500 individuals, so I guess it is a "mean" response by design.

the log link is for the Poisson regression, so the GLM is "response ~ log(dose), (family=poisson)"

 ...Response for dose 0 is a "control" = Ymax. So, What I want is the dose for 50% response. 

*** Once you observe Ymax, Y is no longer Poisson.

I don't understand this? What do you mean? Please explain.
 
 ***What exactly is Ymax?  Is it the response at dose 0? 
Correct. it is measured the same way as for any other Y. (It is also the largest response because the "dose" is always detrimental to growth)

***About the only thing I can actually  interpret is that you want to fit a curve of mean response vs dose, and
 find the dose at which the mean response is half of that at dose 0.

That's it. that sounds right! How? (Confidence interval on log scale and on real scale, etc) Given that the error on Y is Poisson and not "normal"

***That one is easy.
 
OK...?

*** I think you are confusing response with mean response, and we can't 
 disentangle them for you.
 
What else is needed?

bye for now,
#
Hello, and thanks for this.
I found the same, but this is for logistic regression I think, not Poisson. This is used to calculate the dose which causes 50% "death". I could do it this way by "pretending" my data is binomial.
i/e Y/Ymax = some value between 0 and 1. I could "pretend" my response data is "dead" or "alive", but I'm not sure this is "proper". but maybe it is? Any hints on this?

Y and Ymax follows a Poisson distribution. Can I just use Y/Ymax, run a logistic regression and go on with my life???
;-)


(LD50P100 <- dose.p(fitP100, p=14))
Wow, you got the dose.p function to work with poisson regression? but under SPlus only? is it giving the right results?

... my
The same for me!!!

;-)
I worked on this 2:00AM EDT
thanks and have a nice day, possibly starting with a good coffee!!!
If this works under R and is giving the right solution, I owe you BIG TIME !

bye for now,
#
Ymax is the maximum observation in your example, and also the observation 
at zero.  I was asking which you meant: if you meant Y at 0 (and I think 
you do) then it is somewhat misleading notation.

You have a set of Poisson random variables Y_x at different values of x.
Poisson random variables have a mean (I am using standard statistical 
terminilogy), so let's call that mu(x).  Then you seem to want the value 
of x such that  mu(x) = mu(0)/2 *or* mu(x) = Y_0/2, and I don't know 
which, except that in your model mu(0) would be infinity, and so the
model cannot fit your data (finite values of Y_0 have zero probability).
On Fri, 25 Jul 2003, Vincent Philion wrote:

            
So you have -Inf as the explanatory variable at zero dose?
That was understanding Ymax to be the maximum Y, which is what it looks 
like.
the largest response because the "dose" is always detrimental to growth)

The last is not true, given your assumptions,  It could have the largest 
mean response, but 0 is a possible value for Y_0.
Fit a model for the mean response (one that actually can fit your data), 
and solve the estimated mu(x) = mu()/2 or Y_0/2.  That gives you an 
estimate, and the delta method will give your standard errors.

  
    
#
The Poisson assumption means that Y is a number of independent events 
from a theoretically infinite population occurring in a specific time or 
place.  The function "glm" with 'family="poisson"' with the default link 
= "log" assumes that the logarithm of the mean of Y is a linear model in 
the explanatory variable.

	  How is Y measured?  Is it the number out of 500 who exceed a certain 
threshold, or is it the average percentage increase in weight of the 500 
or what?  If it the number out N, with N approximately 500 (and you know 
N), then you have a logistic regression situation.  In that case, 
section 7.2 in Venables and Ripley (2002) should do what you want.  If Y 
is a percentage increase

	  When dose = 0, log(dose) = (-Inf).  Since 0 is a legitimate dose, 
log(dose) is not acceptable in a model like this.  You need a model like 
Peter suggested.  Depending on you purpose, log(dose+0.015) might be 
sufficiently close to a model like what Peter suggested to answer your 
question.  If not, perhaps this solution will help you find a better 
solution.

	  I previously was able to get dose.p to work in R, and I just now was 
able to compute from its output.  The following worked in both S-Plus 
6.1 and R 1.7.1:

 > LD50P100p <- print(LD50P100)
              Dose         SE
p = 14: -2.451018 0.04858572
 > exp(LD50P100p[1,1]+c(-2,0,2)*LD50P100p[1,2])-0.015
[1] 0.06322317 0.07120579 0.08000303

hope this helps.  spencer graves
Vincent Philion wrote:
but Y is a biologicial "growth". Only one "observation" for each X. But
this observation is from the growth contribution of about 500 individuals,
so I guess it is a "mean" response by design.
#
Hello again, sorry for the notation. Again, I'm just a biologist!!!

;-)

But I'm enjoying this problem quite a bit! I'm very grateful for all the input. This is great.
On 2003-07-25 08:38:00 -0400 Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
Answers:
I will clean up my notation!
OK, I want x for mu(x) = mu(0)/2.
Correct, This is part of the problem! The model does not "hold" for X = 0.
Yes, you are right, but then there is no growth, nad no LD50 value, so we reject this sample...
Then you suggest using another model that will account for zero dose, OK. I think I saw something similar in another reply. I need to read it more carefully.
#
Hi, ... and good morning!

;-)
On 2003-07-25 08:43:35 -0400 Spencer Graves <spencer.graves at PDF.COM> wrote:

            
OK, I think my data can fit that description.
Y is the number of line intercepts which encounters mycelial growth. i/e if mycelia intercepts the line twice, 2 is reported. This follows poisson. 

If it the number out N, with N approximately 500 (and you know N),
No, 500 spores can grow, but there is no "real" limit on the amount of growth possible, and so no limit on the number of intercepts. So this is why I adopted Poisson, not knowing how complicated my life would become!!!
;-)

  In that case, section 7.2 in
... But you may be right, that I'm making this just too complicated and that I should simply look at percentage... Any comments on that?
OK, I see I will need stronger coffee to tackle this, but I will read this in depth today.

 Depending on you purpose, log(dose+0.015) might be
In other words, "cheat" and model Y_0 with a "small" value = log(0.015) ? How would this affect the LD50 value calculated and the confidence intervals? I guess I could try several methods, but how would I go about choosing the right one? Criteria?
OK, I will need to try this (later today). I don't see "dose.p" in this?

again, many thanks,
#
Peter Dalgaard suggested using "a Michalis-Menten style inhibition 
(mean(Y) = ymax/(1+dose/dose50) or variants thereof)."  If you are not 
familiar with the literature on "Michalis-Menten inhibition", I suggest 
you research that and cite it in your research report.  Even if you 
don't use it, the knowledge of it might help persuade others that you at 
least considered it.

	  If we adopt Ripley's notation, this is mu(dose) = 
ymax/(1+dose/dose50), where ymax and dose50 are to be estimated.  We can 
get this, sort of, using "glm", as follows:  This model is the same as 
log(mu(dose)) = ymax - log(1+dose/dose50).  I have not tried this, but I 
believe that for fixed dose50 = 0.3, this model can be written for glm 
with the toy example I used before as follows:

Phytopath <- data.frame(x=c(0, 0.03, 0.1), 		
	y=c(28, 21, 11))
fitP.3 <- glm(y~offset(-log(1+x/0.3)), data=Phytopath[rep(1:3, 100),],
   family="poisson")
fitP.3$deviance
[1] 359.5893

	  Now, replace 0.3 by other possible values for dose50 to find the 
minimum "deviance".  Then get a 95% confidence interval using "profile 
likelihood" to find the points above and below the minimium where the 
"deviance" exceeds the minimum by  qchisq(0.95, 1) = 3.841459.  There 
are better ways to do this, but none that I can make work in the next 2 
minutes.  When I did this, I get the following:

 > fitP.06 <- glm(y~offset(-log(1+x/0.06)), data=Phytopath[rep(1:3, 100),],
+  family="poisson")
 > fitP.06$deviance
[1] 16.54977
 > fitP.065 <- glm(y~offset(-log(1+x/0.065)), data=Phytopath[rep(1:3, 
100),],
+  family="poisson")
 > fitP.065$deviance
[1] 11.59525
 > fitP.07 <- glm(y~offset(-log(1+x/0.07)), data=Phytopath[rep(1:3, 100),],
+  family="poisson")
 > fitP.07$deviance
[1] 10.96356
 > fitP.075 <- glm(y~offset(-log(1+x/0.075)), data=Phytopath[rep(1:3, 
100),],
+  family="poisson")
 > fitP.075$deviance
[1] 13.49366
 > fitP.08 <- glm(y~offset(-log(1+x/0.08)), data=Phytopath[rep(1:3, 100),],
+  family="poisson")
 > fitP.08$deviance
[1] 18.34463

 From this, I get an approximate maximum likelihood estimate for dose50 
of 0.07 with a 95% confidence interval by rough interpolation of 0.061 
to 0.076.

hope this helps.  spencer graves
p.s.  Have you plotted y vs. x with appropriate transformations?  If 
not, I suggest you do that before anything else.  Doug Bates says that 
students in his classes who don't plot the data get instant F's.  When I 
don't bother plotting the data several different ways, I often do stupid 
things.
Vincent Philion wrote:
i/e if mycelia intercepts the line twice, 2 is reported. This follows
poisson.
growth possible, and so no limit on the number of intercepts. So this is
why I adopted Poisson, not knowing how complicated my life would become!!!
and that I should simply look at percentage... Any comments on that?
How would this affect the LD50 value calculated and the confidence 
intervals?
I guess I could try several methods, but how would I go about choosing the
right one? Criteria?