Skip to content

What to do when a factor term has several p values?

2 messages · Toby Marthews, Greg Snow

#
Dear Very-patient Mixed-modelling list,

Thank you very much for your replies to my nesting question earlier today. EXTREMEly helpful! It seems I'm tripping over a lot of basic misconceptions with this LME application.

I am running an lme fit with two categorical fixed effects (in this case roostsitu which is roosting situation of some birds - nestbox, tree, inside or other - and mnth=Jan,Nov) and I am trying to simplify the model, i.e. considering whether there is a significant interaction between mnth and roostsitu when measuring the mass of these birds. According to the Fixed effects table of the summary.lme I have 3 p-values (0.1802, 0.3683 and 0.5474) so there's no significant interaction for any of the levels of roostsitu (readout below).

I have tried and failed to create an example to show this, but say there were another factor FF in the LME model and I were trying to follow a model simplification process based on these p-values. Further suppose that the p-value of roostsitu:FF were 0.400. There's a question here whether I would remove roostsitu:FF or roostsitu:mnth from the model first during my model simplification process.

(1) If I'm always supposed to consider the maximum p-value across all levels of a factor, then roostsitu:mnth scores 0.5474 which is >0.400 and it goes out first
(2) If I'm always supposed to take the mean p-value then roostsitu:mnth will score mean(c(0.1802,0.3683,0.5474))=0.3653 which is <0.400 so roostsitu:FF will go out first.
(3) Or some other calculation?

Is there a basic principle or rule I'm missing here regarding what to do in the case of multi-level factors? I would really appreciate someone telling me which option is the right one. I have just spent >1 hour searching a large number of websites and leafed through Pinheiro & Bates again but can't find an answer to this. Lots of websites say to use p-values (referencing Crawley generally) but I need a bit more detail than is in Crawley, it seems.

Thanks very much!
Toby Marthews
Linear mixed-effects model fit by REML
 Data: NULL 
       AIC      BIC    logLik
  449.6082 472.3749 -214.8041

Random effects:
 Formula: ~1 | subject
        (Intercept) Residual
StdDev:   0.5868961 4.165333

Fixed effects: stmass ~ mnth * roostsitu 
                          Value Std.Error DF  t-value p-value
(Intercept)                83.6  1.330205 36 62.84747  0.0000
mnthJan                     7.2  1.862793 36  3.86516  0.0004
roostsitunest-box          -4.2  1.881193 36 -2.23263  0.0319
roostsituinside            -5.0  1.881193 36 -2.65789  0.0117
roostsituother             -8.2  1.881193 36 -4.35893  0.0001
mnthJan:roostsitunest-box   3.6  2.634388 36  1.36654  0.1802
mnthJan:roostsituinside     2.4  2.634388 36  0.91103  0.3683
mnthJan:roostsituother      1.6  2.634388 36  0.60735  0.5474
 Correlation: 
                          (Intr) mnthJn rstst- rststn rststt mntJ:- mnthJn:rststn
mnthJan                   -0.700                                                 
roostsitunest-box         -0.707  0.495                                          
roostsituinside           -0.707  0.495  0.500                                   
roostsituother            -0.707  0.495  0.500  0.500                            
mnthJan:roostsitunest-box  0.495 -0.707 -0.700 -0.350 -0.350                     
mnthJan:roostsituinside    0.495 -0.707 -0.350 -0.700 -0.350  0.500              
mnthJan:roostsituother     0.495 -0.707 -0.350 -0.350 -0.700  0.500  0.500       

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.75548143 -0.76870435 -0.08640394  0.70218233  2.16928300 

Number of Observations: 80
Number of Groups: 40
numDF denDF   F-value p-value
(Intercept)        1    36 31143.554  <.0001
mnth               1    36    95.458  <.0001
roostsitu          3    36    10.614  <.0001
mnth:roostsitu     3    36     0.657  0.5838
#
If there were set rules that we could just tell you, then we could also just tell the computer and there would be no need for you (or any other human).  To answer your questions does not require a set of rules, but understanding your data and your question(s) about the data.  Things that you should know much better than us or the computer.

Does it even make sense to collapse partial factors?  Or should you only be considering a whole factor?

Do the interactions make sense scientifically?  Which is of more interest to you?

Why are you simplifying the model?  Stepwise procedures bias the final estimates and often don't answer the real question(s) of interest.

Have you considered that the p-values that you are looking at may not be as meaningful as you had hoped?

Learn about your data and the questions that are of interest to you before worrying about set rules that lead off on probably meaningless tangents.