Getting out of using a mixed model if "nested" data points are not actually correlated?
On 03/22/2011 08:50 AM, M.S.Muller wrote:
Dear all, I am trying to analyze some behavioural data that have a zero-inflated Poisson distribution. I want to use the ?zeroinfl? function from the ?pscl? package that allows me to analyze a Poisson process (that generates the count portion of the data) and a binomial process (that generates the excess zeros) all within the same model. The only problem is that I have a paired design: 32 individuals split into sixteen sibling pairs (each sib in a pair received a different treatment; treatment is the predictor of main interest, although I have a couple of other covariates). Ideally, I would include ?sibling pair? as a random factor, but unfortunately inclusion of random factors is not possible in the ?zeroinfl? function. Is there a way for me to demonstrate that siblings are statistically independent (e.g. by showing no correlations in their residuals or something) and therefore justify treating them as independent data points in the model? Thanks in advance for your help. Martina
Three possibilities: (1) use glmmADMB with zeroinfl=TRUE (at the moment it will only handle a single random effect, but it seems that your problem doesn't require more); (2) use MCMCglmm (see the course notes for an example; (3) as you suggest, fit the model and then show a lack of statistically significant correlations in the residuals. I would probably try #1 first.
On 22-03-11, r-sig-mixed-models-request at r-project.org wrote:
Send R-sig-mixed-models mailing list submissions to r-sig-mixed-models at r-project.org To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models or, via email, send a message with subject or body 'help' to r-sig-mixed-models-request at r-project.org You can reach the person managing the list at r-sig-mixed-models-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of R-sig-mixed-models digest..." Today's Topics: 1. Re: Offtopic: Why I am receiving duplicate emails? (Douglas Bates) 2. Re: Satterthwaite?s df in lme (Douglas Bates) 3. hlm (software) exploratory analysis (Sebasti?n Daza) 4. Re: Observation-level random effect to model overdispersion (John Maindonald) 5. Library (lattice) (Shankar Lanke) 6. Re: Library (lattice) (Joshua Wiley) 7. Re: generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well... (Franssens, Samuel) ----------------------------------------------------------------------
Message: 1
Date: Mon, 21 Mar 2011 13:14:33 -0500 From: Douglas Bates <bates at stat.wisc.edu> To: Manuel Ram?n Fern?ndez <m.ramon.fernandez at gmail.com> Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Offtopic: Why I am receiving duplicate emails? Message-ID: <AANLkTim4A_u+uXnRi-m+W5SrNou+j1JGg_kuTprcHi26 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 2011/3/13 Manuel Ram?n Fern?ndez <m.ramon.fernandez at gmail.com>:
For acoupleofweeksI amreceivingduplicateemails fromthe list.Isthis happeningtoanyone?How I canfix it?
I don't know why you are receiving duplicates. You address above is subscribed. Is it possible that you have another address subscribed and mail to the second address is forwarded to your gmail.com account? ------------------------------ Message: 2 Date: Mon, 21 Mar 2011 15:13:34 -0500 From: Douglas Bates <bates at stat.wisc.edu> To: "Flores-de-Gracia, Eric" <eef201 at exeter.ac.uk> Cc: "r-sig-mixed-models at r-project.org" <r-sig-mixed-models at r-project.org> Subject: Re: [R-sig-ME] Satterthwaite?s df in lme Message-ID: <AANLkTineiSA2Xu2PaOzoU4AJdfSCZHaXfeXJM8JoHTDF at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Thu, Feb 24, 2011 at 12:05 PM, Flores-de-Gracia, Eric <eef201 at exeter.ac.uk> wrote:
Hi all.
I am just running a lme model and found some criticism regarding the fact that DF appears just as integers.
What are the implications of this, in terms of reporting those DF and the reliability of my model when other packages like SPSS that do mixel models reports on DF based on Satterthwait?s correction? My data set is unbalanced, so should I expect this to affect the calculation of DF using lme?
There are many formulas for approximation of degrees of freedom in linear mixed models but they are all approximations. That is, the ratio of the estimate and the approximate standard error isn't distributed as a T so the question of which degrees of freedom works best is a bit vague to begin with. Typically the exact number of degrees of freedom chosen only makes a difference when the sample sizes are small or the number of levels for one of more of the grouping factors for the random effects is small. In practice the difference between a T distribution with 40 degrees of freedom and a T distribution with 400 degrees of freedom is negligible. If the degrees of freedom provided by the summary of an lme fit is reasonably large then the particular approximation used is not important. ------------------------------ Message: 3 Date: Mon, 21 Mar 2011 17:20:19 -0500 From: Sebasti?n Daza <sebastian.daza at gmail.com> To: "r-sig-mixed-models at r-project.org" <r-sig-mixed-models at r-project.org> Subject: [R-sig-ME] hlm (software) exploratory analysis Message-ID: <4D87CF23.7010108 at gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi everyone, Does anyone know if there is a procedure to conduct a exploratory analysis as HLM performs? Exploratory analysis is defined as "estimated level-2 coefficients and their standard errors obtained by regressing EB residuals on level-2 predictors selected for possible inclusion in subsequent HLM runs". Thank you in advance. -- Sebasti?n Daza sebastian.daza at gmail.com ------------------------------ Message: 4 Date: Tue, 22 Mar 2011 09:23:07 +1100 From: John Maindonald <john.maindonald at anu.edu.au> To: "M.S.Muller" <m.s.muller at rug.nl> Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Observation-level random effect to model overdispersion Message-ID: <E50E2806-9231-41BC-91E3-5FE147A2379B at anu.edu.au> Content-Type: text/plain; charset=us-ascii One point, additional to other responses. There is just one "miniature variance component" (by the way, not miniature in the sense that it has to be small). As I understand it, a normal distribution with this variance generates one random effect, on the scale of the linear predictor, for each observation. On the scale of the response, well . . . John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 21/03/2011, at 10:51 PM, M.S.Muller wrote:
Dear all, I'm trying to analyze some strongly overdispersed Poisson-distributed data using R's mixed effects model function "lmer". Recently, several people have suggested incorporating an observation-level random effect, which would model the excess variation and solve the problem of underestimated standard errors that arises with overdispersed data. It seems to be working, but I feel uneasy using this method because I don't actually understand conceptually what it is doing. Does it package up the extra, non-Poisson variation into a miniature variance component for each data point? But then I don't understand how one ends up with non-zero residuals and why one can't just do this for any analyses (even with normally-distributed data) in which one would like to reduce noise. I may be way off base here, but does this approach model some kind of mixture distribution that's a combination of Poisson and whatever distribution the extra variation is? I've read that people often use a negative binomial distribution (aka Poisson-gamma) to model overdispersed count data in which they assume that the process is Poisson (so they use a log link) but the extra variation is a gamma distribution (in which variance is proportional to square of the mean). The frequently referred to paper by Elston et al (2001) describes modeling a Poisson-lognormal distribution in which overdispersion arises from errors taking on a lognormal distribution. Is the approach of using the observation-level random effect doing something similar, and simply assuming some kind of Poisson-normal mixed distribution? Does this approach therefore assume that the observation-level variance is normally distributed? If anyone could give me any guidance on this, I would appreciate it very much. Martina Muller
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
------------------------------ Message: 5 Date: Mon, 21 Mar 2011 23:16:53 -0400 From: Shankar Lanke <shankarlanke at gmail.com> To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Library (lattice) Message-ID: <AANLkTim-Xz+T9oE2WYHQyUw9pgwLt-XgDRp+gm2+e_fu at mail.gmail.com> Content-Type: text/plain
Dear All,
I am looking for a code to plot X and Multiple Y scatter plot.(Table below). I used the following R-Code ,however I am unable to figure out how to plot lines plot for Y1 and show the ID number on top of each plot.
library("lattice") xyplot(log(Y1)+log(Y2)~X|ID,layout=c(0,4))
ID X Y1 Y2 1 0.5 0.4 1 1 0.2 0.1 2 1 1 2 3 1 4 5 4 2 2.5 3 5 2 3 4 6 2 0.5 0.4 7 2 0.2 0.1 8 2 1 2 9 2 4 5 8 2 2.5 3 7 2 3 4 6 3 0.5 0.4 5 3 0.2 0.1 4 3 1 2 3 4 5 3 2.5 3 3 3 4 Thank you very much for your suggestions.
Regards, Shankar Lanke Ph.D. University at Buffalo Office #
716-645-4853 Fax # 716-645-2886 Cell # 678-232-3567
[[alternative HTML version deleted]]
------------------------------
Message: 6 Date: Mon, 21 Mar 2011 21:02:54 -0700 From: Joshua Wiley
<jwiley.psych at gmail.com> To: Shankar Lanke
<shankarlanke at gmail.com> Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Library (lattice) Message-ID:
<AANLkTimQ+QOTKqksosyRO7cY0Ah52gyL0DS6HOxRt05A at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Dear Shankar,
It seems like this is more of a question for R-help than mixed
models, but in any case, I think this does what you want. See
comments in the code for explanation.
HTH,
Josh
############################ ## Only using part of your data dat <-
read.table(textConnection(" ID X Y1 Y2 1 0.5 0.4 1 1 0.2
0.1 2 1 1 2 3 1 4 5 4 2 2.5 3 5 2 3
4 6 2 0.5 0.4 7 2 0.2 0.1 8 2 1 2 9 2 4
5 8 2 2.5 3 7 2 3 4 6 3 0.5 0.4 5 3 0.2 0.1
4"), header = TRUE) closeAllConnections()
## Sorting will making the lines prettier dat <- dat[order(dat$ID,
dat$X), ] ## Convert ID to a factor to show its labels dat$ID <-
factor(dat$ID)
require(lattice) ## type = "l" specifies lines rather than points
xyplot(log(Y1) + log(Y2) ~ X | ID, data = dat, type = "l",
layout=c(0,4)) ############################
On Mon, Mar 21, 2011 at 8:16 PM, Shankar Lanke
<shankarlanke at gmail.com> wrote:
Dear All,
I am looking for a code to plot X and Multiple Y scatter plot.(Table below). I used the following R-Code ,however I am unable to figure out how to plot lines plot for Y1 and show the ID number on top of each plot.
library("lattice") xyplot(log(Y1)+log(Y2)~X|ID,layout=c(0,4))
ID ?X ? ?Y1 ?Y2 1 ?0.5 ?0.4 ? 1 1 ?0.2 ?0.1 ? 2 1 ?1 ? ? ?2 ? ?3 1 ?4 ? ? ?5 ? ?4 2 ?2.5 ? 3 ? ?5 2 ?3 ? ? ?4 ? ?6 2 ?0.5 ?0.4 ? 7 2 ?0.2 ?0.1 ? 8 2 ? 1 ? ? ?2 ? 9 2 ? 4 ? ? ?5 ? 8 2 ?2.5 ? 3 ? ?7 2 ?3 ? ? ?4 ? ?6 3 ?0.5 ?0.4 ?5 3 ?0.2 ?0.1 ?4 3 ?1 ? ? ?2 3 ? 4 ? ? ?5 3 ?2.5 ? 3 3 ?3 ? ? ?4 Thank you very much for your suggestions.
Regards, Shankar Lanke Ph.D. University at Buffalo Office # 716-645-4853 Fax # 716-645-2886 Cell # 678-232-3567 ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ------------------------------ Message: 7 Date: Tue, 22 Mar 2011 08:24:51 +0000 From: "Franssens, Samuel" <Samuel.Franssens at econ.kuleuven.be> To: "r-sig-mixed-models at r-project.org" <r-sig-mixed-models at r-project.org> Subject: Re: [R-sig-ME] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well... Message-ID: <B12069DAD987FC4FB5B3B08B752F8E430144873A at ECONMBX2A.econ.kuleuven.ac.be> Content-Type: text/plain; charset="iso-8859-1" Thanks a lot for taking the time to help. At least now I know I am not doing something silly. I re-analyzed data from an earlier experiment, where we did a wrong analysis (treating proportions of correct responses per subject as continuous variable, and then doing a repeated measures anova, without correction; http://ppw.kuleuven.be/reason/wim/data/COGNIT_2010_correctedproofs.pdf , exp1). In this experiment, we used the same design, but there was no power manipulation (so you have the same 8 reasoning problems per person, two types of problem, conflict & no conflict, but no between subjects variable). Here, glmer gives very nice results. I look forward to reading a chapter in your book on GLMMs, as this is a very common design in (social) psychology (and also very often the wrong analysis is done, cf. above :-) ) Sam. -----Original Message----- From: dmbates at gmail.com [mailto:dmbates at gmail.com] <dmbates at gmail.com]> On Behalf Of Douglas Bates Sent: Monday 21 March 2011 6:36 PM To: Franssens, Samuel Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well... On Sat, Mar 19, 2011 at 6:17 AM, Franssens, Samuel <Samuel.Franssens at econ.kuleuven.be> wrote:
Thanks for the insightful reply! I've studied linear mixed models at undergraduate level, but I was unfamiliar with GLMMs. It was kind of hard at first to find out how to analyze these data, but I'm glad I found this group. I'm also trying to read the book on ecology by Zuur (2009).
My data come from an experiment in which people have to solve two types of reasoning problems, 4 of each kind. The noconflict problems are filler problems which are very easy, the conflict problems are harder, some people will get all 4 wrong, some will get all 4 right. I could leave the no conflict problems out of the analysis, but I want to conclude that the power manipulation had no effect on performance on noconflict problems, but it did on the conflict problems (to rule out alternative explanations such as decreased motivation/attention). I think an analysis on the full data set is preferred over 2 separate analyses per problem type.
Unfortunately the results don't indicate an interaction between the power manipulation and the conflict/noconflict nature of the question. I would agree that in general we prefer to do an "omnibus" analysis incorporating all the observed data. Unfortunately with a binary response there can be circumstances where you can't really get that much out of the data, precisely because it is so homogeneous. Paradoxically the difficult cases to fit with generalized linear models are those where the "signal" is very strong - the so-called complete separation cases. In this case having almost all of the noconflict questions answered correctly means that you get very little information from those responses.
I made the histograms and this confirms your expectations: for the no conflict problems, in all ?three groups nobody scores less than 3/4, which is also evident from the means in these cells. For the conflict problems, the data have a similar distribution in all three groups, most mass is on 4/4, with 0/4, 1/4, 2/4, 3/4 getting almost equal mass. So indeed, variability is large.
I tried to fit some models with effect of type also varying over subjects, because I thought this would make sense intuitively (some people will be very good at noconflict, but very bad at conflict; others will be good at both), but then I consistently get correlations between random effects of -1, which I thought was an indication of convergence problems, so I left this term out.
I have read about the different estimation approaches used in GLMM's and I've noticed experts recommend not using PQL. So I will now focus on Laplace and GHQ (I knew this was possible in glmer, but did not know how much nAGQ should be, therefore I also used glmmML). nAGQ 7 and 9 gives almost completely the same results, but they differ a bit from laplace:
Generalized linear mixed model fit by the adaptive Gaussian Hermite approximation Formula: accuracy ~ f_power * f_type + (1 | f_subject) ? Data: syllogisms ? AIC ? BIC logLik deviance ?441.5 473.2 -213.7 ? ?427.5 Random effects: ?Groups ? ?Name ? ? ? ?Variance Std.Dev. ?f_subject (Intercept) 4.7286 ? 2.1745 Number of obs: 688, groups: f_subject, 86 Fixed effects: ? ? ? ? ? ? ? ? ? ? ? ? ? ? Estimate Std. Error z value Pr(>|z|) (Intercept) ? ? ? ? ? ? ? ? ?1.470526 ? 0.493530 ? 2.980 ?0.00289 ** f_powerhp ? ? ? ? ? ? ? ? ? ?0.124439 ? 0.690869 ? 0.180 ?0.85706 f_powerlow ? ? ? ? ? ? ? ? ? 2.020807 ? 0.833203 ? 2.425 ?0.01529 * f_typenoconflict ? ? ? ? ? ? 3.267835 ? 0.643001 ? 5.082 3.73e-07 *** f_powerhp:f_typenoconflict ? 0.219783 ? 0.927490 ? 0.237 ?0.81268 f_powerlow:f_typenoconflict -0.005484 ? 1.455748 ?-0.004 ?0.99699
and
Generalized linear mixed model fit by the Laplace approximation Formula: accuracy ~ f_power * f_type + (1 | f_subject) ? Data: syllogisms ?AIC ? BIC logLik deviance ?406 437.7 ? -196 ? ? ?392 Random effects: ?Groups ? ?Name ? ? ? ?Variance Std.Dev. ?f_subject (Intercept) 4.9968 ? 2.2353 Number of obs: 688, groups: f_subject, 86
Fixed effects: ? ? ? ? ? ? ? ? ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) (Intercept) ? ? ? ? ? ? ? ? ?1.50745 ? ?0.50507 ? 2.985 ?0.00284 ** f_powerhp ? ? ? ? ? ? ? ? ? ?0.13083 ? ?0.70719 ? 0.185 ?0.85323 f_powerlow ? ? ? ? ? ? ? ? ? 2.04121 ? ?0.85308 ? 2.393 ?0.01672 * f_typenoconflict ? ? ? ? ? ? 3.28715 ? ?0.64673 ? 5.083 3.72e-07 *** f_powerhp:f_typenoconflict ? 0.21680 ? ?0.93165 ? 0.233 ?0.81599 f_powerlow:f_typenoconflict -0.01199 ? ?1.45807 ?-0.008 ?0.99344
In these cases the fits using adaptive Gauss-Hermite quadrature are considered more reliable. It's actually a good thing that the results from aGHQ and the Laplace approximation are similar. They should be.
I don't really know what criterion I should use to choose between models. Are any of the two actually useful, or are my data not fit for this kind of analysis? Is there another kind of analysis I could try?
I would be concerned about the very high estimated standard error of the random effects for subject. You may be able to draw conclusions about the power treatment and the type of question but I wouldn't be terribly confident of the results. Others reading the list may have better suggestions than I can provide.
Thanks a lot for your time. I also think I understand the comment on the random effects :) -----Original Message----- From: dmbates at gmail.com [mailto:dmbates at gmail.com] <dmbates at gmail.com]> On Behalf Of Douglas Bates Sent: Saturday 19 March 2011 10:02 AM To: David Duffy Cc: Franssens, Samuel; r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well... On Fri, Mar 18, 2011 at 9:54 PM, David Duffy <davidD at qimr.edu.au> wrote:
On Fri, 18 Mar 2011, Franssens, Samuel wrote:
Hi, I have the following type of data: 86 subjects in three independent groups (high power vs low power vs control). Each subject solves 8 reasoning problems of two kinds: conflict problems and noconflict problems. I measure accuracy in solving the reasoning problems. To summarize: binary response, 1 within subject var (TYPE), 1 between subject var (POWER). I wanted to fit the following model: for problem i, person j: logodds ( Y_ij ) = b_0j + b_1j TYPE_ij with b_0j = b_00 + b_01 POWER_j + u_0j and b_1j = b_10 + b_11 POWER_j I think it makes sense, but I'm not sure. Here are the observed cell means: ? ? ? ? ? ?conflict ? ? ? noconflict control ? ? 0.6896552 ? ? ? 0.9568966 high ? ? ? ?0.6935484 ? ? ?0.9677419 low ? ? ? ? 0.8846154 ? ? ? 0.9903846 GLMER gives me: Formula: accuracy ~ f_power * f_type + (1 | subject) ?Data: syllogisms Random effects: Groups ?Name ? ? ? ?Variance Std.Dev. subject (Intercept) 4.9968 ? 2.2353 Number of obs: 688, groups: subject, 86 Fixed effects: ? ? ? ? ? ? ? ? ? ? ? ? ? Estimate Std. Error z value Pr(>|z|) (Intercept) ? ? ? ? ? ? ? ? ?1.50745 ? ?0.50507 ? 2.985 ?0.00284 ** f_powerhp ? ? ? ? ? ? ? ? ? ?0.13083 ? ?0.70719 ? 0.185 ?0.85323 f_powerlow ? ? ? ? ? ? ? ? ? 2.04121 ? ?0.85308 ? 2.393 ?0.01672 * f_typenoconflict ? ? ? ? ? ? 3.28715 ? ?0.64673 ? 5.083 3.72e-07 *** f_powerhp:f_typenoconflict ? 0.21680 ? ?0.93165 ? 0.233 ?0.81599 f_powerlow:f_typenoconflict -0.01199 ? ?1.45807 ?-0.008 ?0.99344 --- Strange thing is that when you convert the estimates to probabilities, they are quite far off. For control, no conflict (intercept), the estimation from glmer is 1.5 -> 81% and for glmmPQL is 1.14 -> 75%, whereas the observed is: 68%. Am I doing something wrong?
You are forgetting that your model includes a random intercept for each subject.
To follow up on David's comment a bit, you have a very large estimated standard deviation for the random effects for subject. ?An estimated standard deviation of 2.23 in the glmer fit is huge. ?On the scale of the linear predictor the random effects would swamp the contribution of the fixed effects - different subjects' random effects could easily differ by 8 or more (+/ 2 s.d.). ?Transformed to the probability scale that takes you from virtually impossible to virtually certain. ?To get that large an estimated standard deviation you would need different subjects exposed to the same experimental conditions and who get completely different results (all wrong vs all right). ?The estimate of the fixed effect for type is also very large, considering that this is a binary covariate. I would suggest plotting the data, say by first grouping the subjects into groups according to power then by subject then by type. ?You should see a lot of variability in ?responses according to subject within power. Regarding the difference between the glmer and the glmmPQL results, bear in mind that glmmPQL is based on successive approximation of the GLMM by an LMM. ?This case is rather extreme and the approximation is likely to break down. ?In fact, for the glmer fit, I would recommend using nAGQ=7 or nAGQ=9 to see if that changes the estimates substantially. ?The difference between the default evaluation of the deviance by the Laplace approximation and the more accurate adaptive Gauss-Hermite quadrature may be important in this case. I just saw your other message about the properties of the random effects. ?The values returned are the conditional modes of the random effects given the observed data. ?The mean 0 and standard deviation of 2.3 apply to the marginal or unconditional distribution of the random effects, not the conditional distribution.
------------------------------
_______________________________________________ R-sig-mixed-models mailing list R-sig-mixed-models at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models End of R-sig-mixed-models Digest, Vol 51, Issue 54 **************************************************
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models