Skip to content

Anova II table, df, drop1 and very complex regression models!

6 messages · Thierry Onkelinx, Shadiya Al Hashmi, Steven J. Pierce +1 more

#
Good morning,


I have complex data of 7 variables (6 treatment + 1 control [?age? in the
model below]) plus 18 interactions in a dataset of 2448 observations which
have some missing values (NAs).  The maximal model which I have simplified
as per y hypothesis is as follows.



modelAAW<-glmer(match~Listgp + vowel.quality + stimulus.presentation +
context +length + age + freq.+ Listgp:context+ Listgp:length+ Listgp:freq.+
Listgp:stimulus.presentation+ Listgp:age+ context:length+ context:freq.+
context:stimulus.presentation+ context:age+ length:freq.+
length:stimulus.presentation+ length:age+ freq.:stimulus.presentation+
freq.:age+ stimulus.presentation:age+  Listgp:stimulus.presentation +
Listgp:vowel.quality + stimulus.presentation:vowel.quality +
(Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener) , data =
SBAAW, family = "binomial", control=glmerControl(optCtrl=list(maxfun=2e5)),
nAGQ =1)



I ran a binomial logistic regression analysis on the data and did the
stepwise regression manually since the drop1(modelAAW, test = "Chisq")
command yielded no results in a span of more than 16 hours. The resulting
regression models are nested in the maximal model (modelAAW).



Then, I reached the model selection step where I have to interpret the
anova table below which has degrees of freedom of zero for some models.



         Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)

AAWXI    49 2454.5 2738.8 -1178.2   2356.5

AAWXII   49 2456.4 2740.8 -1179.2   2358.4 0.0000      0     1.0000

AAWXIII  49 2456.4 2740.8 -1179.2   2358.4 0.0000      0     1.0000

AAWIX    51 2457.6 2753.5 -1177.8   2355.6 2.8749      2     0.2375

AAWX     51 2457.6 2753.5 -1177.8   2355.6 0.0000      0     1.0000

AAWVI    52 2458.6 2760.4 -1177.3   2354.6 0.9410      1     0.3320

AAWVII   52 2458.6 2760.4 -1177.3   2354.6 0.0075      0     <2e-16 ***

AAWVIII  52 2458.6 2760.3 -1177.3   2354.6 0.0094      0     <2e-16 ***

AAWV     54 2458.4 2771.7 -1175.2   2350.4 4.2412      2     0.1200

AAWIII   56 2461.9 2786.9 -1175.0   2349.9 0.4275      2     0.8076

AAWIV    56 2461.9 2786.9 -1175.0   2349.9 0.0000      0     1.0000

AAWII    57 2463.9 2794.7 -1175.0   2349.9 0.0215      1     0.8835

AAWI     60 2467.9 2816.1 -1174.0   2347.9 1.9763      3     0.5773

modelAAW 66 2474.4 2857.3 -1171.2   2342.4 5.5778      6     0.4721

Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1





So many people that I have consulted tell me that I shouldn?t trust a model
of df=zero and advise that I should re-run the models using the drop1
command or simplify the maximal model but in my case I don?t know if I will
ever get results especially that it took more than 16 hours straight with
no luck and I simplified the maximal model to the best I could.



When I checked R documentation, I read that ?when given a sequence of
objects, anova tests the models against one another in the order specified?
based on the AIC value from the smallest to the largest.  However, there
is  a warning that ?the comparison between two or more models will only be
valid if they are fitted to the same dataset. This may be a problem if
there are missing values and R's default of na.action = na.omit is used?,
so I?m assuming this is the case with my models.



Now, should I select model AAWVIII since it has the least p-value (and the
least BIC value compared to AAWVII)?



The formulas of the two models in addition to AAWXI model are as follows.
The first two models are similar to each other except that in AAWVIII the
variable stimulus.presentation is deleted and in model AAWXI the variable
Listgp is deleted.


AAWVI<-glmer(match~Listgp + vowel.quality + stimulus.presentation + context
+ age + freq.+ Listgp:freq.+ Listgp:stimulus.presentation+ context:length+
context:freq.+ context:stimulus.presentation+length:stimulus.presentation+
length:age+ freq.:stimulus.presentation+ freq.:age+
stimulus.presentation:age+ + Listgp:stimulus.presentation +
Listgp:vowel.quality + stimulus.presentation:vowel.quality +
(Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener) , data =
SBAAW, family = "binomial", control=glmerControl(optCtrl=list(maxfun=2e5)),
nAGQ =1)



AAWVIII<-glmer(match~Listgp + vowel.quality + context + age + Listgp:freq.+
Listgp:stimulus.presentation+ context:length+ context:freq.+
context:stimulus.presentation+length:stimulus.presentation+ length:age+
freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+ +
Listgp:stimulus.presentation + Listgp:vowel.quality +
stimulus.presentation:vowel.quality + (Listgp|stimulus) +
(stimulus.presentation+vowel.quality|listener) , data = SBAAW, family =
"binomial", control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)



AAWXI<-glmer(match~vowel.quality + context + age + Listgp:freq.+
Listgp:stimulus.presentation+ context:length+ context:freq.+
context:stimulus.presentation+length:stimulus.presentation+
freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+ +
Listgp:stimulus.presentation + Listgp:vowel.quality +
stimulus.presentation:vowel.quality + (Listgp|stimulus) +
(stimulus.presentation+vowel.quality|listener) , data = SBAAW, family =
"binomial", control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)



I would appreciate your help with this.
#
Dear Shadiya,

You need to do the model selection on a stable dataset. Therefore you
should create a subset which doesn't contain missing values in the
covariates. Use this subset for model selection. Then you can refit the
final model on the total dataset.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-08-30 7:23 GMT+02:00 Shadiya Al Hashmi <saah500 at york.ac.uk>:

  
  
#
Dear Thierry,

Thanks for the tip and for always being the first to respond to my
statistical dilemmas:)

I will try your suggestion now.

Best wishes,

Shadiya

On 30 August 2016 at 10:29, Thierry Onkelinx <thierry.onkelinx at inbo.be>
wrote:

  
  
#
I completely agree that analyzing a stable dataset is essential, but I'd argue that selecting a subset containing only cases with no missing data is actually changing the nature of the population to which your inferences can be generalized. In effect it tacks "and who have no missing data" onto the definition of the target population. That is unlikely to be the population you really want to study. It may also dramatically reduce your sample size, thereby reducing power and precision of the estimates.  

To preserve inference to the original intended population, solve the missing data problem with either some form of imputation or by using a full-information maximum likelihood estimation method that doesn?t throw out cases with missing data. 


Steven J. Pierce, Ph.D.
Associate Director
Center for Statistical Training & Consulting (CSTAT)
Michigan State University

-----Original Message-----
From: Thierry Onkelinx [mailto:thierry.onkelinx at inbo.be] 
Sent: Tuesday, August 30, 2016 3:30 AM
To: Shadiya Al Hashmi <saah500 at york.ac.uk>
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Anova II table, df, drop1 and very complex regression models!

Dear Shadiya,

You need to do the model selection on a stable dataset. Therefore you
should create a subset which doesn't contain missing values in the
covariates. Use this subset for model selection. Then you can refit the
final model on the total dataset.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-08-30 7:23 GMT+02:00 Shadiya Al Hashmi <saah500 at york.ac.uk>:

  
  
#
Can you recommend a convenient full-information ML estimation method
that handles a wide range of cases (hopefully nested+crossed, GLMM +
LMM, etc, hopefully implemented in R) ?

  Ben Bolker
On 16-08-30 08:42 AM, Steven J. Pierce wrote:
#
I'd probably use Mplus (www.statmodel.com) for that, because it implements a very flexible generalized latent variable modeling framework that handles random effects via latent variables. It supports FIML estimation, plus a number of other estimators, including Bayesian estimation via MCMC. While expensive and not open source, it's probably the leading software package for people who do structural equation modeling; it's heavily used in the social sciences and educational research. 

Fortunately, the lavaan package in R replicates some of what Mplus can do. It may be capable of handling the problem, but I haven?t been keeping up with recent development of lavaan to see how much of the more recent Mplus features it is now supporting. 

However, if one goes the imputation route instead, there are several R packages that offer sophisticated support for imputation (Amelia, mi, mice, mitools, mix, pan, VIM, and likely more that I haven't seen/tried yet). Then you don?t need FIML estimation, just whatever normal modeling tools you already use. 


Steven J. Pierce, Ph.D.
Associate Director
Center for Statistical Training & Consulting (CSTAT)
Michigan State University

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com] 
Sent: Tuesday, August 30, 2016 2:15 PM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Anova II table, df, drop1 and very complex regression models!


  Can you recommend a convenient full-information ML estimation method
that handles a wide range of cases (hopefully nested+crossed, GLMM +
LMM, etc, hopefully implemented in R) ?

  Ben Bolker
On 16-08-30 08:42 AM, Steven J. Pierce wrote: