lme4, failure to converge with a range of optimisers, trust the fitted model anyway?
No, you need to treat this still as binomial data, using cbind(y,n-y) as the response where y is the number of positives in each group, and n is the total in each group. I suggest reading one of the books that discusses fitting logistic models in R, most advanced texts have a section. Introductory Statistics with R by Peter Dalgaard has the section available in Amazon. You also still need a random effect for the cluster. While I'm thinking of it, should clusters and country random effects have been crossed. Generally the sampling is setup so that clusters are nested within countries which requires a different syntax.
On 5 April 2015 at 10:41, Hans Ekbrand <hans.ekbrand at gmail.com> wrote:
On Sat, Apr 04, 2015 at 07:36:04PM -0400, Ben Bolker wrote:
So, perhaps I could aggregate the individual level data to the cluster level, and do without the random term for cluster? I mean, calculate the proportions of yes in each cluster and use that as the dependent variable. This would, I assume, require that each cluster was given a weight that corresponded to the number of individuals in it - or I would not be able to say anything about probabilities at the individual level, right?
I would say pretty much any time you can aggregate without losing information, you should, for ease of computation (this accords with the Murtaugh 2007 "Simplicity and complexity" paper that I cite here on a regular basis). If all of your covariates are at the cluster level
[ Thanks for your advice, it's much appreciated. ] The covariates are not all at the cluster level. Actually only one covariate is at the cluster level, the others are at the country level.
then this reduces to a binomial GLM (you can account for overdispersion either by using a quasi-binomial model in glm() or by staying with glmer() and keeping the random effect of cluster (now an observation-level random effect).
Since I still have the country level to account for, I need to use glmer().
I have created an aggregated data set with a variable for the
proportion deprived=TRUE in each cluster, and a variable for the
number of cases in that cluster (to use as weight).
Perhaps this is not a question specific for mixed models, but what
family and link function would be appropriate now? This was the best I
could come up with:
library(car)
lm1 <- lmer(logit(Prop.deprived.in.cluster) ~ (1|Country) + QoG + GDPLog +
Rural * KilledPerMillion5Log, data = my.small.df, weights =
my.small.df$weight)
Linear mixed model fit by REML ['lmerMod']
Formula: logit(Prop.deprived.in.cluster) ~ (1 | Country) + QoG + GDPLog +
Rural * KilledPerMillion5Log
Data: my.small.df
Weights: my.small.df$weight
REML criterion at convergence: 157709.8
Scaled residuals:
Min 1Q Median 3Q Max
-4.9983 -0.5626 -0.0622 0.4371 6.1171
Random effects:
Groups Name Variance Std.Dev.
Country (Intercept) 1.768 1.33
Residual 139.292 11.80
Number of obs: 38028, groups: Country, 65
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.25542 1.90567 0.66
QoG 0.39120 0.40850 0.96
GDPLog -0.51799 0.23031 -2.25
Ruralyes 1.91545 0.02232 85.80
KilledPerMillion5Log 0.06190 0.28557 0.22
Ruralyes:KilledPerMillion5Log 0.35303 0.03728 9.47
Correlation of Fixed Effects:
(Intr) QoG GDPLog Rurlys KlPM5L
QoG 0.571
GDPLog -0.989 -0.486
Ruralyes -0.013 0.001 0.006
KlldPrMll5L -0.013 0.239 -0.009 0.045
Rrlys:KPM5L 0.004 -0.001 0.000 -0.519 -0.084
my.small.df is available here if someone wants to have a go at it
http://hansekbrand.se/code/my.small.df.RData (1.1 MB)
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
*Ken Beath* Lecturer Statistics Department MACQUARIE UNIVERSITY NSW 2109, Australia Phone: +61 (0)2 9850 8516 Building E4A, room 526 http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/ CRICOS Provider No 00002J This message is intended for the addressee named and may...{{dropped:9}}