I completely agree that analyzing a stable dataset is essential, but
I'd argue that selecting a subset containing only cases with no
missing data is actually changing the nature of the population to
which your inferences can be generalized. In effect it tacks "and who
have no missing data" onto the definition of the target population.
That is unlikely to be the population you really want to study. It
may also dramatically reduce your sample size, thereby reducing power
and precision of the estimates.
To preserve inference to the original intended population, solve the
missing data problem with either some form of imputation or by using
a full-information maximum likelihood estimation method that doesn?t
throw out cases with missing data.
Steven J. Pierce, Ph.D. Associate Director Center for Statistical
Training & Consulting (CSTAT) Michigan State University
-----Original Message----- From: Thierry Onkelinx
[mailto:thierry.onkelinx at inbo.be] Sent: Tuesday, August 30, 2016 3:30
AM To: Shadiya Al Hashmi <saah500 at york.ac.uk> Cc:
r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Anova II
table, df, drop1 and very complex regression models!
Dear Shadiya,
You need to do the model selection on a stable dataset. Therefore
you should create a subset which doesn't contain missing values in
the covariates. Use this subset for model selection. Then you can
refit the final model on the total dataset.
Best regards,
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
Research Institute for Nature and Forest team Biometrie &
Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat
25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination
of some data and an aching desire for an answer does not ensure that
a reasonable answer can be extracted from a given body of data. ~
John Tukey
2016-08-30 7:23 GMT+02:00 Shadiya Al Hashmi <saah500 at york.ac.uk>:
Good morning,
I have complex data of 7 variables (6 treatment + 1 control [?age?
in the model below]) plus 18 interactions in a dataset of 2448
observations which have some missing values (NAs). The maximal
model which I have simplified as per y hypothesis is as follows.
modelAAW<-glmer(match~Listgp + vowel.quality +
stimulus.presentation + context +length + age + freq.+
Listgp:context+ Listgp:length+ Listgp:freq.+
Listgp:stimulus.presentation+ Listgp:age+ context:length+
context:freq.+ context:stimulus.presentation+ context:age+
length:freq.+ length:stimulus.presentation+ length:age+
freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+
Listgp:stimulus.presentation + Listgp:vowel.quality +
stimulus.presentation:vowel.quality + (Listgp|stimulus) +
(stimulus.presentation+vowel.quality|listener) , data = SBAAW,
family = "binomial", control=glmerControl(optCtrl=
list(maxfun=2e5)), nAGQ =1)
I ran a binomial logistic regression analysis on the data and did
the stepwise regression manually since the drop1(modelAAW, test =
"Chisq") command yielded no results in a span of more than 16
hours. The resulting regression models are nested in the maximal
model (modelAAW).
Then, I reached the model selection step where I have to interpret
the anova table below which has degrees of freedom of zero for some
models.
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
AAWXI 49 2454.5 2738.8 -1178.2 2356.5
AAWXII 49 2456.4 2740.8 -1179.2 2358.4 0.0000 0
1.0000
AAWXIII 49 2456.4 2740.8 -1179.2 2358.4 0.0000 0
1.0000
AAWIX 51 2457.6 2753.5 -1177.8 2355.6 2.8749 2
0.2375
AAWX 51 2457.6 2753.5 -1177.8 2355.6 0.0000 0
1.0000
AAWVI 52 2458.6 2760.4 -1177.3 2354.6 0.9410 1
0.3320
AAWVII 52 2458.6 2760.4 -1177.3 2354.6 0.0075 0 <2e-16
***
AAWVIII 52 2458.6 2760.3 -1177.3 2354.6 0.0094 0 <2e-16
***
AAWV 54 2458.4 2771.7 -1175.2 2350.4 4.2412 2
0.1200
AAWIII 56 2461.9 2786.9 -1175.0 2349.9 0.4275 2
0.8076
AAWIV 56 2461.9 2786.9 -1175.0 2349.9 0.0000 0
1.0000
AAWII 57 2463.9 2794.7 -1175.0 2349.9 0.0215 1
0.8835
AAWI 60 2467.9 2816.1 -1174.0 2347.9 1.9763 3
0.5773
modelAAW 66 2474.4 2857.3 -1171.2 2342.4 5.5778 6
0.4721
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
So many people that I have consulted tell me that I shouldn?t trust
a model of df=zero and advise that I should re-run the models using
the drop1 command or simplify the maximal model but in my case I
don?t know if I will ever get results especially that it took more
than 16 hours straight with no luck and I simplified the maximal
model to the best I could.
When I checked R documentation, I read that ?when given a sequence
of objects, anova tests the models against one another in the order
specified? based on the AIC value from the smallest to the largest.
However, there is a warning that ?the comparison between two or
more models will only be valid if they are fitted to the same
dataset. This may be a problem if there are missing values and R's
default of na.action = na.omit is used?, so I?m assuming this is
the case with my models.
Now, should I select model AAWVIII since it has the least p-value
(and the least BIC value compared to AAWVII)?
The formulas of the two models in addition to AAWXI model are as
follows. The first two models are similar to each other except that
in AAWVIII the variable stimulus.presentation is deleted and in
model AAWXI the variable Listgp is deleted.
AAWVI<-glmer(match~Listgp + vowel.quality + stimulus.presentation +
context + age + freq.+ Listgp:freq.+ Listgp:stimulus.presentation+
context:length+ context:freq.+
context:stimulus.presentation+length:stimulus.presentation+
length:age+ freq.:stimulus.presentation+ freq.:age+
stimulus.presentation:age+ + Listgp:stimulus.presentation +
Listgp:vowel.quality + stimulus.presentation:vowel.quality +
(Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener)
, data = SBAAW, family = "binomial", control=glmerControl(optCtrl=
list(maxfun=2e5)), nAGQ =1)
AAWVIII<-glmer(match~Listgp + vowel.quality + context + age +
Listgp:freq.+ Listgp:stimulus.presentation+ context:length+
context:freq.+
context:stimulus.presentation+length:stimulus.presentation+
length:age+ freq.:stimulus.presentation+ freq.:age+
stimulus.presentation:age+ + Listgp:stimulus.presentation +
Listgp:vowel.quality + stimulus.presentation:vowel.quality +
(Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener)
, data = SBAAW, family = "binomial",
control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)
AAWXI<-glmer(match~vowel.quality + context + age + Listgp:freq.+
Listgp:stimulus.presentation+ context:length+ context:freq.+
context:stimulus.presentation+length:stimulus.presentation+
freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+
+ Listgp:stimulus.presentation + Listgp:vowel.quality +
stimulus.presentation:vowel.quality + (Listgp|stimulus) +
(stimulus.presentation+vowel.quality|listener) , data = SBAAW,
family = "binomial",
control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)
I would appreciate your help with this.
-- Shadiya
[[alternative HTML version deleted]]