interactions
Dear Jason, Are you interested in the variability of the random intercept for each day along marker? If you just want to allow a different random intercept per day and per marker, you could consider (1|marker/day) instead of (day|marker). If day has 15 levels then (day|marker) has to estimate 15 variances and 105 covariances! This is probably the reason why the model doesn't converge. (1|marker/day) requires only 2 variances to be estimated. HTH, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens
Iasonas Lamprianou
Verzonden: maandag 29 maart 2010 23:54
Aan: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] interactions
Dear colleagues,
apologies for the long question, but I really need some help.
It is not a case of academic laziness, I have done my
homework but now I need some advice.
I have 10000 students nested in 50 schools, each student
taking a test. The tests are randomly grouped in batches of
20 scripts (around 500 batches). Then, two random markers
(out of a pool of 57 markers) mark each batch blindly. Each
marker marks around 17 random batches of 20 scripts. If the
two random markers disagree by more than 10% on the total
score for a specific student, then the test of the student is
being given to another marker randomly for a third blind
marking. So, I have students nested within schools which are
nested in areas. Then, scripts (the tests of the students)
nested into batches. Then, batches crossed with markers.
However, the actual marking lasted around 15 days. One batch
may be marked by the first marker on day 1 and then marked by
the second marker on the next day (or the day after etc). So
the batches are also crossed by days.
My Research question is: are there markers who are
differentially severe or lenient on different days?
I use this model:
lmer(score ~
1+day+gender+school+(1+day|marker)+(1|candidate)+(1|batch),
mg2007_sub)
I assume that this model is appropriate, because it models
candidates as random effects (each candidate has an ability
estimate based on the scores he/she received from the two or
the three markers if his/her script was remarked by a third
marker). Also, every batch is modelled as a random effect.
Every marker is modelled as a random effect but I allow
his/her estimate to vary by day, so I assume that this would
allow me to see if a marker behaves in a different way on
different days. So this should answer my research question.
Question 1: does this model make sense statistically (as i
formulate it in lme4?)
So far so good, but lme4 will run for 48 hours and then fail
because it reached maximum iterations. I assume that even if
I allow for more iterations, it will go on for ever.
Question 2: Why is it so slow? Why it does not converge?
Then, I decided to simplify the problem. Instead of modelling
day as a factor, I use 'd' which is the cardinal number of
day e.g. 1, 2, 3, days that passed since we started marking.
I assume that this is an interval scale. If a marker becomes
(linearly) more lenient, he/she would 'like' this model. So I run
lmer(score ~
1+day+gender+dvhool+(1+d|marker)+(1|candidate)+(1|batch), mg2007_sub)
please notice that (1+day|marker) where day is a factor with
15 levels, becomes (1+d|marker) which is numeric variable
and this gives
Error terms:
Groups Name Std.Dev. Corr
candidate (Intercept) 16.85
batch (Intercept) 3.49
marker (Intercept) 4.68
d 0.38 -0.77
Residual 4.83
---
number of obs: 18001, groups: candidate, 9402; batch, 470;
marker, 59 AIC = 138232, DIC = 138388.9 deviance = 138230.4
Question 3: How do I know that the variance of 'd' is
statistically significant? MlWin gives some error estimates.
Question 4: The correlation of -0.77 is a problem?
I hope somebody can help me, this has taken me three days up to now...
Thank you
jason
Dr. Iasonas Lamprianou
Assistant Professor (Educational Research and Evaluation)
Department of Education Sciences European University-Cyprus
P.O. Box 22006
1516 Nicosia
Cyprus
Tel.: +357-22-713178
Fax: +357-22-590539
Honorary Research Fellow
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044 161 275 3485
iasonas.lamprianou at manchester.ac.uk
--- On Mon, 29/3/10, r-sig-mixed-models-request at r-project.org
<r-sig-mixed-models-request at r-project.org> wrote:
From: r-sig-mixed-models-request at r-project.org
<r-sig-mixed-models-request at r-project.org>
Subject: R-sig-mixed-models Digest, Vol 39, Issue 47 To: r-sig-mixed-models at r-project.org Date: Monday, 29 March, 2010, 21:25 Send R-sig-mixed-models mailing list submissions to ??? r-sig-mixed-models at r-project.org To subscribe or unsubscribe via the World Wide Web, visit ??? https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models or, via email, send a message with subject or body 'help' to ??? r-sig-mixed-models-request at r-project.org You can reach the person managing the list at ??? r-sig-mixed-models-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of R-sig-mixed-models digest..." Today's Topics: ???1. Re: The issue of increasing maximum number of iterations ? ? ? (Ben Bolker) ???2. poisson GLMER with identity link (Tim Carnus) ???3. Re: unbalanced data in nested lmer model (Jana B?rger) ???4. Re: Correlation of -1: is it a problem? (Eric Castet) ???5. Re: unbalanced data in nested lmer model (Luca Borger) ???6. Re: unbalanced data in nested lmer model (Ben Bolker)
----------------------------------------------------------------------
Message: 1 Date: Mon, 29 Mar 2010 15:50:30 +0000 (UTC) From: Ben Bolker <bolker at ufl.edu> To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] The issue of increasing maximum number of ??? iterations Message-ID: <loom.20100329T171254-536 at post.gmane.org> Content-Type: text/plain; charset=us-ascii Martin Stjernman <Martin.Stjernman at ...> writes:
Dear listmembers! I have also had trouble with the maxIter specification
in (g)lmer.
I've tried to increase the number of iterations (to 1000) via the control statement in lmer
but it still stops at 300 (the default). We are now a number of users having problem with this
but
I haven't seen any solution to the problem on this list and I would really appreciate any hint on what I
am doing wrong.
I am running on a Vista 64-bit machine and my model
specification and
? session info follows below. Please let me know if any other information needs to be
attached. ? Looking at the source code in lmer.R, this seems like a moderately obvious hole: on or about line 714 (I may be off by a couple of lines) we have ? ? if (missing(verbose)) verbose <- cv$msVerbose ### FIXME: issue a warning if the model argument is FALSE.? It is ignored. adding the lines ? ? FL$dims["mxit"] <- cv$maxIter ? ? FL$dims["mxfn"] <- cv$maxFN ? appears to work: running (gm1 <- glmer(cbind(incidence, size - incidence) ~ period + ???(1 | herd), family=binomial, data=cbpp, control=list(maxIter=1000))) produces a model with the correct iteration number listed in its guts. ? The next question is whether you or someone at your institution is capable of modifying lmer.R appropriately and rebuilding the package ... ------------------------------ Message: 2 Date: Mon, 29 Mar 2010 17:45:08 +0100 From: Tim Carnus <Tim.Carnus at ucd.ie> To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] poisson GLMER with identity link Message-ID: <1269881108.3596.24.camel at tim-laptop> Content-Type: text/plain; charset=UTF-8 Dear list, I am trying to fit a number of GLMERs to count data with an additive model (in the predictors) that requires the use of the identity link function. For about half of my response variables this causes no problems. However in a number of cases the model fitting runs into problems with regards estimation of negative mean (for e.g. the error message in mer_finalize: mu[i] must be positive: mu = 1.76267e-312, i = 13075456). As far as I understand this is well known and documented, and guarding against that possibility is necessary, and built in to say the glm() function. My question then is, how can I do this with lmer? (ie how can I specify the constraints necessary to fit these types of models, if at all possible) Best regards, Tim Carnus ------------------------------ Message: 3 Date: Mon, 29 Mar 2010 16:17:41 +0200 From: Jana B?rger <jana.buerger at uni-rostock.de> To: Andrew Dolman <andydolman at gmail.com> Cc: "r-sig-mixed-models at r-project.org" ??? <r-sig-mixed-models at r-project.org> Subject: Re: [R-sig-ME] unbalanced data in nested lmer model Message-ID: <4BB0B685.70506 at uni-rostock.de> Content-Type: text/plain; charset="UTF-8"; format=flowed Dear Andrew and other list members, As I described in an earlier
post(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/ 003503.html)
my data is actually hierarchical down to the level of fields within farms. There is more than 500 cases on 8 farms in 6 regions. Would you not think that gives enough power to distinguish within region variability vs. between regions? Moreover I don't understand your argument that fitting random efects with less than 5 levels was dodgy, as often examples in the books have 3 samples from one beach, or 3 laboratory workers within one laboratory. These are less than 5 levels, are they not? Regards, Jana Andrew Dolman schrieb:
Dear Jana, ? >An anova(lm1, lm2)?
lm1<-lmer(y~x1+x2+...+(1|region)+(1|region:farm));
lm2<-lmer(y+x1+x2+...+(1|farm)) said models did not
differ significantly
and AIC was about the same. So I know there is no
additional explanatory
power including the region term. ? >Yet, I would like to keep the region effect
in the model to separate
and compare the effect size of region vs. farm. Is it
valid to do so
even if? some of the regions are only represented
by one farm?
I don't think you have the data to ask questions about
differences
between regions as distinct from differences between
farms. Look at it
this way. If you were just doing a normal comparison
between regions and
you only looked at 1 or 2 farms per region, would you
have the
statistical power to say that differences were due to
region rather than
farm? Answer = No. Similarly, are the differences between the farms
because they are in
different regions or just normal variation between
farms? Well you only
have 2 farms per region so it's hard to tell. Maybe
you just have enough
data if pairs of farms within regions are always very
similar and
differences between regions large. Also. Fitting random effects with fewer than 5 levels
is dodgy, and you
only have 2 levels of farm per region, sometimes 1. Perhaps you could look at it this way. compare m1 <- lmer (y~(1|region)) m2 <- lmer (y~(1|farm)) If m2 is better then there is variation between farms
within regions, if
there's no difference then region accounts for most of
the variation.
BUT you've not got much power to detect farm effects
within regions, so
a null result is not strong evidence for the absence
of farm variation
within regions. Andy. ? andydolman at gmail.com
<mailto:andydolman at gmail.com>
-- Jana B?rger Universit?t Rostock Agrar-? und Umweltwissenschaftliche Fakult?t FG Phytomedizin Satower Stra?e 48 18059 Rostock Tel. 0381-498 31 71 Fax.0381-498 31 62 ------------------------------ Message: 4 Date: Mon, 29 Mar 2010 19:45:15 +0200 From: Eric Castet <Eric.Castet at incm.cnrs-mrs.fr> To: Douglas Bates <bates at stat.wisc.edu> Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Correlation of -1: is it a problem? Message-ID: <4BB0E72B.2070404 at incm.cnrs-mrs.fr> Content-Type: text/plain Dear Doug, Thanks for your reply. I've done as you suggested (your point b/), i.e. I've fit another model that considers 'couleurs' within 'nom' However, after running anova() (likelihood ratio test), I find that I should keep the initial model that contains the -1 correlation:
> anova (jb.lmer1, jb.lmer2)
Data: jb Models: jb.lmer2: lRT ~ couleurs + (1 | nom) + (1 | nom:couleurs) jb.lmer1: lRT ~ couleurs + (1 + couleurs | nom) ? ? ? ? ? Df???AIC???BIC? logLik? Chisq Chi Df Pr(>Chisq) jb.lmer2? 5 17260 17295 -8625.2 jb.lmer1? 6 17251 17293 -8619.4 11.584? ? ? 1? 0.0006654 *** So, the question is now: a/ How can I justify to refuse the initial model that contains the -1 correlation? b/ And a parallel question is: what is wrong with the -1 correlation? Is it because it is exactly -1 ? Would it still be a problem if it were -0.99 ? c/ Ultimately, how can I report what appears to me as an important result: namely the high correlation between the intercept and the 'couleurs' effect for each subject? Thanks, Eric
a/ is it really a statistical (or numerical)
problem to have a -1
correlation in the model that I should keep? ? ? ?
Yes, it is.? The fitted model is has a singular
variance-covariance
matrix for the random effects and that is not
good.? In fact, it is no
longer a linear mixed model. ? ?
b/ is it possible to remove the correlation
between Intercept and
Couleurs, as I would do if Couleurs were not a
categorical factor?
? ? ?
I would fit another model of IRT ~ couleurs + (1|nom:couleurs) + (1|nom) and see how that works.? This model is, in some
sense, intermediate to
the models that you have fit above. ? ?
Thanks in advance, Eric Castet -- Eric Castet Institut de Neurosciences Cognitives de la
M?diterran?e -- INCM CNRS
31 chemin Joseph Aiguier 13402 Marseille cedex 20 (France) tel : (+33)(0)4-91-16-43-34 fax : (+33) (0)4-91-16-44-98 UMR 6193 du CNRS Universit? Aix-Marseille II http://www.incm.cnrs-mrs.fr/equipedyva.php http://www.incm.cnrs-mrs.fr/pperso/ecastet.php ? ? ?
???[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org?
mailing list
? ?
-- Eric Castet Institut de Neurosciences Cognitives de la M?diterran?e -- INCM CNRS 31 chemin Joseph Aiguier 13402 Marseille cedex 20 (France) tel : (+33)(0)4-91-16-43-34 fax : (+33) (0)4-91-16-44-98 UMR 6193 du CNRS Universit? Aix-Marseille II http://www.incm.cnrs-mrs.fr/equipedyva.php http://www.incm.cnrs-mrs.fr/pperso/ecastet.php ??? [[alternative HTML version deleted]] ------------------------------ Message: 5 Date: Mon, 29 Mar 2010 13:55:01 -0400 From: "Luca Borger" <lborger at uoguelph.ca> To: Jana B?rger <jana.buerger at uni-rostock.de>,??? "Andrew Dolman" ??? <andydolman at gmail.com> Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] unbalanced data in nested lmer model Message-ID: <D63BEA7FF21244B88504AA33F4E7C14D at lborger> Content-Type: text/plain; format=flowed; charset="utf-8"; ??? reply-type=response Hello, as Andrew and others already explained:
There is more than 500 cases
Fine, this might give you reasonable estimates about how your y is affected by your fixed effects covariates (x1,x2,...)
(...) on 8 farms in 6 regions.
#and, from your previos post
For 2 of 8 regions there is only 1 farm, the other
regions have 2 farms. thus no way to estimate a difference between region or farm effects for 2 regions, and very, very limited power for the other 6 (just 2 farms per region). To make things worse your data are also quite unbalanced:
unbalance of case numbers in cells? Or would it be no
problem if cell sizes
vary between 0 and 53?
which I think means for some farms you got only one record? Anyway, to recap, probably OK data for understanding y~x1+x2 etc., insufficient data otherwise (should invest in getting data for more farms within regions, not more data for the farms you have already sampled).
Moreover I don't understand your argument that fitting
random efects with
less than 5 levels was dodgy, as often examples in the
books have 3
samples from one beach, or 3 laboratory workers within
one laboratory.
These are less than 5 levels, are they not?
These are usually toy datasets to exemplify how the approach works, I do not think they make a claim that the resulting variance estimates are very reliable (think in the Zuur etal. mixed effects book you can find more realistic examples, if I remember well). Plus, "level" refers to the number of beaches or the number of labs etc. and the resulting variance estimates - if less than say 5 it appears that you might be better off fitting it as a fixed effect and not trying to decompose the variance into between labs and within labs etc. Anyway, just my 2 cents and hope I explained this correctly... See also the wiki page set up by Ben Bolker: http://glmm.wikidot.com/faq e.g. you might be interested in this entry therein: Zero or very small random effects variance estimates; (...) Very small variance estimates, or very large correlation estimates, often indicates unidentifiability/lack of data (either due to exact identifiability [e.g. designs that are not replicated at an important level] or weak identifiable (designs that would be workable with more data of the same type) HTH Cheers, Luca ----- Original Message ----- From: "Jana B?rger" <jana.buerger at uni-rostock.de> To: "Andrew Dolman" <andydolman at gmail.com> Cc: <r-sig-mixed-models at r-project.org> Sent: Monday, March 29, 2010 10:17 AM Subject: Re: [R-sig-ME] unbalanced data in nested lmer model
Dear Andrew and other list members, As I described in an earlier
post(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/ 003503.html)
my data is actually hierarchical down to the level of
fields within farms.
There is more than 500 cases on 8 farms in 6 regions. Would you not think that gives enough power to
distinguish within region
variability vs. between regions? Moreover I don't understand your argument that fitting
random efects with
less than 5 levels was dodgy, as often examples in the
books have 3
samples from one beach, or 3 laboratory workers within
one laboratory.
These are less than 5 levels, are they not? Regards, Jana Andrew Dolman schrieb:
Dear Jana, ? >An anova(lm1, lm2)?
lm1<-lmer(y~x1+x2+...+(1|region)+(1|region:farm));
lm2<-lmer(y+x1+x2+...+(1|farm)) said models did
not differ significantly
and AIC was about the same. So I know there is no
additional explanatory
power including the region term. ? >Yet, I would like to keep the region
effect in the model to separate
and compare the effect size of region vs. farm. Is
it valid to do so even
if? some of the regions are only represented
by one farm?
I don't think you have the data to ask questions
about differences
between regions as distinct from differences
between farms. Look at it
this way. If you were just doing a normal
comparison between regions and
you only looked at 1 or 2 farms per region, would
you have the
statistical power to say that differences were due
to region rather than
farm? Answer = No. Similarly, are the differences between the farms
because they are in
different regions or just normal variation between
farms? Well you only
have 2 farms per region so it's hard to tell.
Maybe you just have enough
data if pairs of farms within regions are always
very similar and
differences between regions large. Also. Fitting random effects with fewer than 5
levels is dodgy, and you
only have 2 levels of farm per region, sometimes
1.
Perhaps you could look at it this way. compare m1 <- lmer (y~(1|region)) m2 <- lmer (y~(1|farm)) If m2 is better then there is variation between
farms within regions, if
there's no difference then region accounts for
most of the variation. BUT
you've not got much power to detect farm effects
within regions, so a
null result is not strong evidence for the absence
of farm variation
within regions. Andy. ? andydolman at gmail.com
<mailto:andydolman at gmail.com>
-- Jana B?rger Universit?t Rostock Agrar-? und Umweltwissenschaftliche Fakult?t FG Phytomedizin Satower Stra?e 48 18059 Rostock Tel. 0381-498 31 71 Fax.0381-498 31 62
_______________________________________________ R-sig-mixed-models at r-project.org
mailing list
------------------------------ Message: 6 Date: Mon, 29 Mar 2010 16:25:12 -0400 From: Ben Bolker <bolker at ufl.edu> To: Luca Borger <lborger at uoguelph.ca> Cc: Andrew Dolman <andydolman at gmail.com>, ??? "r-sig-mixed-models at r-project.org" <r-sig-mixed-models at r-project.org>, ??? Jana B?rger <jana.buerger at uni-rostock.de> Subject: Re: [R-sig-ME] unbalanced data in nested lmer model Message-ID: <4BB10CA8.6030803 at ufl.edu> Content-Type: text/plain; charset=UTF-8 Luca Borger wrote: [Jana B?rger:]
Moreover I don't understand your argument that
fitting random efects with
less than 5 levels was dodgy, as often examples in
the books have 3
samples from one beach, or 3 laboratory workers
within one laboratory.
These are less than 5 levels, are they not?
These are usually toy datasets to exemplify how the
approach works, I do not
think they make a claim that the resulting variance
estimates are very
reliable (think in the Zuur etal. mixed effects book
you can find more
realistic examples, if I remember well). Plus, "level"
refers to the number
of beaches or the number of labs etc. and the
resulting variance estimates -
if less than say 5 it appears that you might be better
off fitting it as a
fixed effect and not trying to decompose the variance
into between labs and
within labs etc. Anyway, just my 2 cents and hope I
explained this
correctly... See also the wiki page set up by Ben Bolker: http://glmm.wikidot.com/faq e.g. you might be interested in this entry therein: Zero or very small random effects variance estimates; (...) Very small variance estimates, or very large
correlation estimates, often
indicates unidentifiability/lack of data (either due
to exact
identifiability [e.g. designs that are not replicated
at an important level]
or weak identifiable (designs that would be workable
with more data of the
same type)
? I just added this to the FAQ: Should I treat factor xxx as fixed or random? This is in general a far more difficult question than it seems on the surface. There are many competing philosophies and definitions (see Gelman 2xxx). One point of particular relevance to 'modern' mixed model estimation (rather than 'classical' method-of-moments estimation) is that, for practical purposes, there must be a reasonable number of random-effects levels (e.g. blocks) ? more than 5 or 6 at a minimum. ? ? e.g., from Crawley (2002) p. 670: "Are there enough levels of the factor in the data on which to base an estimate of the variance of the population of effects? No, means [you should probably treat the variable as] fixed effects." Some researchers (who treat fixed vs random as a philosophical rather than a pragmatic decision) object to this approach. Treating factors with small numbers of levels as random will in the best case lead to very small estimates of random effects; in the worst case it will lead to various numerical difficulties such as lack of convergence, zero variance estimates, etc.. In the classical method-of-moments approach these problems do not arise (because the sums of squares are always well defined as long as there are at least two units), but the underlying problems of lack of power are there nevertheless. ???(Contributions welcome!) -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc ------------------------------
_______________________________________________ R-sig-mixed-models mailing list R-sig-mixed-models at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models End of R-sig-mixed-models Digest, Vol 39, Issue 47 **************************************************
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.