In a clinical study, events in patients were observed during multiple visits; on each visit, a continuous predictor variable for the poisson-distributed number of events was also available, it is the endpoint of the study. The following model would be suitable glmer(nevent~predictor + (1|subj),data=d, family=poisson) but there is a catch: the recording interval on each day varies randomly, not related to study parameters, from 30 to 60 minutes. The statistical consultant at the university recommended the conservative solution to truncate ALL records to the first 30 minutes, and discard the tails, but the PhD student who did the study was not too happy to loose all data beyond 30 minutes. A compromise would be to normalize all data to events/45 minutes (or median(duration)), assuming that the variance in duration is not too large. Is there a better way to factor out the nuisance parameter duration? Dieter
Adjusting for random recording intervals in glmer/poisson
3 messages · Joshua Wiley, Dieter Menne
Hi Dieter, I do not think that I understand the question or problem very well. What is the significance of the recording interval varying? If the issue is that with a longer recording time, there are more opportunities for events to occur, then what about treating duration as an exposure and including it in the offset? Essentially you model rate then rather than counts. Again apologies if I grossly misunderstanding the issue. Cheers, Josh On Wed, Jul 4, 2012 at 10:35 PM, Dieter Menne
<dieter.menne at menne-biomed.de> wrote:
In a clinical study, events in patients were observed during multiple visits; on each visit, a continuous predictor variable for the poisson-distributed number of events was also available, it is the endpoint of the study. The following model would be suitable glmer(nevent~predictor + (1|subj),data=d, family=poisson) but there is a catch: the recording interval on each day varies randomly, not related to study parameters, from 30 to 60 minutes. The statistical consultant at the university recommended the conservative solution to truncate ALL records to the first 30 minutes, and discard the tails, but the PhD student who did the study was not too happy to loose all data beyond 30 minutes. A compromise would be to normalize all data to events/45 minutes (or median(duration)), assuming that the variance in duration is not too large. Is there a better way to factor out the nuisance parameter duration? Dieter
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Joshua Wiley wrote:
What is the significance of the recording interval varying? If the issue is that with a longer recording time, there are more opportunities for events to occur, then what about treating duration as an exposure and including it in the offset? Essentially you model rate then rather than counts.
Good to hear that you suggest it to put it into the offset; I wanted to do this, but was not sure what exactly to put into the offset term. Duration or log(duration)? Dieter Apologies: I forgot to attach the simulated sample data in the original message library(lme4) nsubj = 10 nvisit = 5 set.seed(100) d = data.frame( subj = as.factor(1:nsubj), duration = runif(nsubj*nvisit,30,60),# in minutes predictor = rnorm(nsubj*nvisit,50,10)) d$nevent = with(d,rpois(nsubj*nvisit,predictor*duration/500)) # Proposed solution by university statistician: # use only the data from the first 30 minutes (not shown here) and do glmer(nevent~predictor + (1|subj),data=d, family=poisson) # Result is not correct, because truncated data not used # Proposed by Joshua glmer(nevent~predictor+offset(log(duration)) + (1|subj), data=d, family=poisson)