Skip to content

Predicted values from glm() when linear predictor is NA.

8 messages · Andrew Robinson, Ebert,Timothy Aaron, David Winsemius +1 more

#
I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

    fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
               data=demoDat)

I got:
That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:
That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

    demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner
#
Hi Rolf,

that's an interesting observation.

I agree that it is counter-intuitive that the fitted values / predictions are not NA.

However, I don't agree with your comment that <<if the linear predictor in a Binomial glm is NA, then "success" is a certainty>> - that seems to be a peculiarity of these data - note for these three observations there are thousands dead and zero alive and that you get similar outcomes if you omit TrtTime altogether ...
26 65 131
20.02007 20.02007 20.02007
Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 28 Jul 2022, 10:27 AM +1000, Rolf Turner <r.turner at auckland.ac.nz>, wrote:
External email: Please exercise caution


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
data=demoDat)

I got:

as.matrix(coef(fit))
[,1]
(Intercept) -0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL1 14.27570572
LifestageL1 + L2 -0.30332697
LifestageL3 -3.58672631
TrtTime:LifestageEgg + L1 8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L2 0.05662651
TrtTime:LifestageL3 1.66743472

That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:

predict(fit)[demoDat$Lifestage=="L1"]
26 65 131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
26 65 131
1 1 1

That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition. My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments? (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it. The file was created using dput() so
you should access it (if you wish to do so) via something like

demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
#
Time is often used in this sort of problem, but really time is not relevant. A better choice is accumulated thermal units. The insect will molt when X thermal units have been accumulated. This is often expressed as degree days, but could as easily be other units like degree seconds. However, I suspect that fine time units exceeds the accuracy of the measurement system. A growth chamber might maintain 28 C, but the temperature the insect experiences might be somewhat different thereby introducing additional variability in the outcome. No thermal units accumulated, no development, so that is not an issue.
    This approach allows one to predict life stage over a large temperature range. Accuracy can be improved if one knows the lower development threshold. At high temperatures development stops, and a mortality function can be added.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Rolf Turner
Sent: Wednesday, July 27, 2022 8:26 PM
To: r-help <r-help at r-project.org>
Subject: [R] Predicted values from glm() when linear predictor is NA.

[External Email]

I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime, explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

    fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
               data=demoDat)

I got:
That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to Lifestage = "L1" would thereby be NA, but this is not the case:
That is, the predicted values on the scale of the linear predictor are large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should anyone want to experiment with it.  The file was created using dput() so you should access it (if you wish to do so) via something like

    demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
#
On 7/27/22 17:26, Rolf Turner wrote:
The NA is most likely caused by aliasing, so some other combination of 
factors a perfect surrogate for every case with that level of the 
interaction. The `predict.glm` function always requires a complete set 
of values to construct a case. Whether apparent incremental linear 
prediction of that interaction term is large or small will depend on the 
degree of independent contribution of the surrogate levels of other 
variables..


David.
#
On Thu, 28 Jul 2022 00:41:52 +0000
Andrew Robinson <apro at unimelb.edu.au> wrote:

            
That gives me some comfort!

Does anyone from R-core feel like commenting?
Hmm.  Yeah.  I was probably jumping the gun a bit there.  I'll think on
this a bit more.

This issue arose in a more complicated context (which I won't try to go
into) in which I really need some insight into how to deal with missing
values of the linear predictor, and what the resulting fitted values
actually mean.  Psigh!

cheers,

Rolf
#
On Thu, 28 Jul 2022 00:42:51 +0000
"Ebert,Timothy Aaron" <tebert at ufl.edu> wrote:

            
Very cogent comments in respect of dealing with the underlying
practical problem, but I am not so much concerned with the practical
problem at the moment but rather with the workings of the software that
I am using.

cheers,

Rolf

P.S.  I am at several removes from the data set(s) that I am messing
about with.  But if my understanding is correct (always an assumption
of which to be sceptical!) these data were collected with the
temperature being held *constant*, whence time and accumulated thermal
units would be equivalent.  Is it not so?

R.
#
On Wed, 27 Jul 2022 18:25:23 -0700
David Winsemius <dwinsemius at comcast.net> wrote:
<SNIP>
<SNIP>

No, I think it's much simpler than that.  Essentially it boils down to
the fact that if y is 1:10 and x is rep(1,10) then

    lm(y ~ x)

gives a slope estimate of NA.  As it clearly should.

cheers,

Rolf
#
Sadly, I see that your practical options are limited. In regards to time and thermal units being equivalent if temperature is constant. The practical answer is yes. The technical answer is no because the growth chamber is not able to keep temperature constant and the insect can moderate the experienced temperature by moving to different parts of the plant (assuming this used plants and not artificial diet or leaf disks). Temperature might be more constant if these are mosquito larvae in vats of water. The water would have thermal mass to even out temperature fluctuations in the growth chamber.
   Often people assume that a growth chamber set to 25C is always 25C. Data show this is not the case even in high end or custom made growth chambers.
   Using time as a surrogate for accumulated thermal units under constant temperature will introduce some error in time due to non-uniform and non-constant temperatures within the growth chamber. Mostly the data are not present to document this let alone include this in the model. So we take the biologists view of statistics and ignore the problem that we cannot solve.

  If I measure time in days, then it makes sense that I can have egg hatch at time 0. However, it is not biologically possible to have anything happen in zero time. If my accuracy in time measurement is "days" then maybe I should consider introducing some small time value for egg hatch. Say eggs hatch at time 0.2 days. That is below the resolution of my data but reconciles the biological impossibility of anything happening in zero time. I note that time is integer in the data set.

  Another odd thing in the data. I assume that the values for "alive" represent the number of living individuals at different time intervals. At the end of the "Alive" data I note that there are a few time intervals where none are alive followed by a living individual. I am not a fan of zombies, or raising the dead (insects). Either the definition of dead is more Monty Python (https://www.youtube.com/watch?v=Jdf5EXo6I68), or I don't quite understand the data.

Tim   

-----Original Message-----
From: Rolf Turner <r.turner at auckland.ac.nz> 
Sent: Wednesday, July 27, 2022 10:10 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: r-help <r-help at r-project.org>
Subject: Re: [R] Predicted values from glm() when linear predictor is NA.

[External Email]

On Thu, 28 Jul 2022 00:42:51 +0000
"Ebert,Timothy Aaron" <tebert at ufl.edu> wrote:

            
Very cogent comments in respect of dealing with the underlying practical problem, but I am not so much concerned with the practical problem at the moment but rather with the workings of the software that I am using.

cheers,

Rolf

P.S.  I am at several removes from the data set(s) that I am messing about with.  But if my understanding is correct (always an assumption of which to be sceptical!) these data were collected with the temperature being held *constant*, whence time and accumulated thermal units would be equivalent.  Is it not so?

R.

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276