P value value for a large number of degree of freedom in lmer

I need to redraft the final sentence of the first paragraph,
to read: "The consequence is that effects that are well within
the bounds of statistical variation may, according to the
the usual rituals, appear statistically significant, "
----------------------------------------------------------------------------

There are other considerations, which may often be more
serious.  In any observational dataset, there is almost
bound to be structure.  This arises in different areas in 
different ways, but some of the possibilities are:
1) a time element
2) a space element
3) a location or culture or group or family element
4) an effect from collection instrument or person.

So the correlation structure is not iid or even i, something
we might be expected to know about on this list.  The
correlations will often be positive.  Even after multi-level
or spatial models have been used to take out what is
thought to be the structure, there will often be structure 
left.  The consequence is that effects that are well within
the bounds of statistical variation may, according to the
the usual rituals, appear statistically significant, 

There are other problems.  Some variables may be measured
very inaccurately.  Used on their own, this reduces the chances
of finding a significant effect, catastrophically if the error is of
the same order of magnitude as the SD of that variable.  
If other accurately measured explanatory variables are included
in the same analysis, they may appear falsely significant.  This
sort of issue has been extensively canvassed in connection
with the use of food frequency questionnaire (FFQ) measuring
instruments in large-scale studies of the effect of diet on disease.
See for example:
Schatzkin, A.; Kipnis, V.; Carroll, R.; Midthune, D.; Subar, A.; Bingham, S.; Schoeller, D.; Troiano, R.; and Freedman, L., 2003. A comparison of a food frequency ques- tionnaire with a 24-hour recall for use in an epidemiological cohort study: results from the biomarker-based observing protein and energy nutrition (open) study. International Journal of Epidemiology, 32:1054?1062.
Here was an instrument that many thought adequately accurate.

These problems may of course affect all observational studies.
Deficiencies in the data and in the modeling (because some
structure is not accounted for) become more likely to show up
as the modeling becomes more sensitive to smallish, but 
perhaps still consequential effects.

In modest sized experiments, careful design can largely
avoid such problems.  In experiments where the number
of subjects is very large, the same sorts of problems will
almost inevitably appear.  Minor deviations from the
protocol become almost impossible to avoid.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

P value value for a large number of degree of freedom in lmer

Thread (15 messages)