An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110328/53ab4397/attachment.pl>
lmer and p-values
18 messages · Iker Vaquero Alba, Ben Bolker, John Maindonald +3 more
Iker Vaquero Alba <karraspito at ...> writes:
?? Dear list members: ?? I am fitting a model with lmer, because I need to fit some nested as well as non-nested random effects in it. I am doing a split plot simplification, dropping terms from the model and comparing the models with or without the term. When doing and ANOVA between one model and its simplified version, I get, as a result, a chisquare value with 1 df (df from the bigger model - df from the simplified one), and a p-value associated. ?? I was just wondering if it's correct to present this chisquare and p values as a result of testing the effect of a certain term in the model. I am a bit confused, as if I was doing this same analysis with lme, I would be getting F-values and associated p-values.
When you do anova() in this context you are doing a likelihood ratio test, which is equivalent to doing an F test with 1 numerator df and a very large (infinite) denominator df. As Pinheiro and Bates 2000 point out, this is dangerous/anticonservative if your data set is small, for some value of "small". Guessing an appropriate denominator df, or using mcmcsamp(), or parametric bootstrapping, or something, will be necessary if you want a more reliable p-value.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110328/05ba347b/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110328/f7d1637a/attachment.pl>
On 03/28/2011 01:04 PM, Iker Vaquero Alba wrote:
Ok, I have had a look at the mcmcsamp() function. If I've got it right, it generates an MCMC sample from the parameters of a model fitted preferentially with "lmer" or similar function. But my doubt now is: even if I cannot trust the p-values from the ANOVA comparing two different models that differ in a term, is it still OK if I simplify the model that way until I get my Minimum Adequate Model, and then I use mcmcsamp() to get a trustable p-value of the terms I'm interested in from this MAM, or should I directly use mcmcsamp() with my Maximum model and simplify it according to the p-values obtained with it? Thank you. Iker
Why are you simplifying the model in the first place? (That is a real question, with only a tinge of prescriptiveness.) Among the active contributors to this list and other R lists, I would say that the most widespread philosophy is that one should *not* do backwards elimination of (apparently) superfluous/non-significant terms in the model. (See myriad posts by Frank Harrell and others.) If you do insist on eliminating terms, then the LRT (anova()) p-values are no more or less reliable for the purposes of elimination than they are for the purposes of hypothesis testing.
--- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com>/* escribi?:
De: Ben Bolker <bbolker at gmail.com>
Asunto: Re: [R-sig-ME] lmer and p-values
Para: r-sig-mixed-models at r-project.org
Fecha: lunes, 28 de marzo, 2011 18:27
Iker Vaquero Alba <karraspito at ...> writes:
>
>
> Dear list members:
>
> I am fitting a model with lmer, because I need to fit some nested
> as well as non-nested random effects in it. I am doing a split plot
> simplification, dropping terms from the model and comparing the
models with or
> without the term. When doing and ANOVA between one model and its
simplified
> version, I get, as a result, a chisquare value with 1 df (df from
the bigger
> model - df from the simplified one), and a p-value associated.
>
> I was just wondering if it's correct to present this chisquare and
> p values as a result of testing the effect of a certain term in
the model. I am
> a bit confused, as if I was doing this same analysis with lme, I
would be
> getting F-values and associated p-values.
>
When you do anova() in this context you are doing a likelihood ratio
test, which is equivalent to doing an F test with 1 numerator df and
a very large (infinite) denominator df.
As Pinheiro and Bates 2000 point out, this is
dangerous/anticonservative
if your data set is small, for some value of "small".
Guessing an appropriate denominator df, or using mcmcsamp(), or
parametric
bootstrapping, or something, will be necessary if you want a more
reliable p-value.
_______________________________________________
R-sig-mixed-models at r-project.org
</mc/compose?to=R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110328/eea05d9f/attachment.pl>
A slightly more accommodating position is that some selection may be acceptable if it makes little difference to the magnitudes of parameter estimates and to the interpretations that can be placed upon them. [Since writing this, I notice that Ben has now posted a message that makes broadly similar follow-up points.] The usual interpretations of p-values assume, among other things, a known model. This assumption is invalidated if there has been some element of backward elimination or other element of variable selection. Following variable selection, the p-value is no longer, strictly, a valid p-value. Elimination of a term with a p-value greater than say 0.15 or 0.2 is however likely to make little differences to estimates of other terms in the model. Thus, it may be a reasonable way to proceed. For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose. Nowadays it is of course relatively easy to do a simulation that will check the effect of a particular variable elimination/selection strategy. If there is some use of variable elimination/selection, and anything of consequence hangs on the results, this should surely be standard practice. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 29/03/2011, at 8:18 AM, Ben Bolker wrote:
On 03/28/2011 01:04 PM, Iker Vaquero Alba wrote:
Ok, I have had a look at the mcmcsamp() function. If I've got it right, it generates an MCMC sample from the parameters of a model fitted preferentially with "lmer" or similar function. But my doubt now is: even if I cannot trust the p-values from the ANOVA comparing two different models that differ in a term, is it still OK if I simplify the model that way until I get my Minimum Adequate Model, and then I use mcmcsamp() to get a trustable p-value of the terms I'm interested in from this MAM, or should I directly use mcmcsamp() with my Maximum model and simplify it according to the p-values obtained with it? Thank you. Iker
Why are you simplifying the model in the first place? (That is a real question, with only a tinge of prescriptiveness.) Among the active contributors to this list and other R lists, I would say that the most widespread philosophy is that one should *not* do backwards elimination of (apparently) superfluous/non-significant terms in the model. (See myriad posts by Frank Harrell and others.) If you do insist on eliminating terms, then the LRT (anova()) p-values are no more or less reliable for the purposes of elimination than they are for the purposes of hypothesis testing.
--- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com>/* escribi?: De: Ben Bolker <bbolker at gmail.com> Asunto: Re: [R-sig-ME] lmer and p-values Para: r-sig-mixed-models at r-project.org Fecha: lunes, 28 de marzo, 2011 18:27 Iker Vaquero Alba <karraspito at ...> writes:
Dear list members: I am fitting a model with lmer, because I need to fit some nested as well as non-nested random effects in it. I am doing a split plot simplification, dropping terms from the model and comparing the
models with or
without the term. When doing and ANOVA between one model and its
simplified
version, I get, as a result, a chisquare value with 1 df (df from
the bigger
model - df from the simplified one), and a p-value associated. I was just wondering if it's correct to present this chisquare and p values as a result of testing the effect of a certain term in
the model. I am
a bit confused, as if I was doing this same analysis with lme, I
would be
getting F-values and associated p-values.
When you do anova() in this context you are doing a likelihood ratio
test, which is equivalent to doing an F test with 1 numerator df and
a very large (infinite) denominator df.
As Pinheiro and Bates 2000 point out, this is
dangerous/anticonservative
if your data set is small, for some value of "small".
Guessing an appropriate denominator df, or using mcmcsamp(), or
parametric
bootstrapping, or something, will be necessary if you want a more
reliable p-value.
_______________________________________________ R-sig-mixed-models at r-project.org </mc/compose?to=R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On 03/28/2011 06:15 PM, John Maindonald wrote:
Elimination of a term with a p-value greater than say 0.15 or 0.2 is however likely to make little differences to estimates of other terms in the model. Thus, it may be a reasonable way to proceed. For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose.
Note that naive likelihood ratio tests of random effects are likely to be conservative (in the simplest case, true p-values are twice the nominal value) because of boundary issues and those of fixed effects are probably anticonservative because of finite-size effects (see PB 2000 for examples of both cases.)
John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
Ben
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110329/36152a64/attachment.pl>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 11-03-29 07:35 AM, Manuel Sp?nola wrote:
I am not a statistician, but what the p-value is telling me? Is not more important the effect size? Best, Manuel
Hmm. What's the motivation for your question? The p-value gives you the probability of the observed pattern, or a more extreme one, having occurred if the null hypothesis were true. The effect size (defined in various ways) tells you something about the strength of the observed pattern. Statistical and subject-area (in your case, biological) significance are complementary. A highly statistically significant but biologically trivial effect is a curiosity; a biologically important but statistically insignificant effect means you need more/better data. I don't know if that answers your question.
On 28/03/2011 04:40 p.m., Ben Bolker wrote:
On 03/28/2011 06:15 PM, John Maindonald wrote:
Elimination of a term with a p-value greater than say 0.15 or 0.2 is however likely to make little differences to estimates of other terms in the model. Thus, it may be a reasonable way to proceed. For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose.
Note that naive likelihood ratio tests of random effects are likely to be conservative (in the simplest case, true p-values are twice the nominal value) because of boundary issues and those of fixed effects are probably anticonservative because of finite-size effects (see PB 2000 for examples of both cases.)
John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
Ben
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- *Manuel Sp?nola, Ph.D.* Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o <https://sites.google.com/site/lobitoderio/> Institutional website: ICOMVIS <http://www.icomvis.una.ac.cr/>
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2RzTIACgkQc5UpGjwzenMT3wCfa9orCpx295kTrVJKScLLKnGb HSkAn3Rp5TvrdiUJZjTphkW7biIaqkip =cACS -----END PGP SIGNATURE-----
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110329/837eb90a/attachment.pl>
On Tue, Mar 29, 2011 at 8:45 AM, Manuel Sp?nola <mspinola10 at gmail.com> wrote:
Thank you very much Ben. ? Yes, that answer my question. I didn't have a bad intention but for many non-statisticians I think is confusing why there still so much emphasis on p-values. I know that this will be controversial and I don't have the background to discuss with a statistician but I am confused with the use of p-value in many instances by statisticians. ?Many well known statisticians have been very critical on the use of p-value as usually are used in statistics. ?Here there is a link with a list of quotes of many well known statisticians against null hypothesis significance testing (http://warnercnr.colostate.edu/~anderson/nester.html ).
This topic (and this web page) has been discussed at length on
this list recently. Check out the archives.
I like to think of p-values and hypothesis testing as a more scientific
variant of trial by jury, where the theory to be proved ("as charged")
is found guilty by establishing that inconsistent theories (null hypotheses)
are unlikely to be true given the observed data. If the null hypothesis
is true ("beyond a reasonable doubt"), then the theory to be tested "could
not have been at the scene of the crime." Note that just as in a jury
trial, this does not prove that the theory in question is true with
absolute certainty.
In practice one usually entertains several possible models or theories
and selects the one that seems to explain the data best by eliminating
most of the variance in the observations. More precisely, a good model
is one where the residual is negligible and looks like "noise."
Dominick
Some of the quotes: Yates - "the emphasis given to formal tests of significance ... has resulted in ... an undue concentration of effort by mathematical statisticians on investigations of tests of significance applicable to problems which are of little or no practical importance ... and ... it has caused scientific research workers to pay undue attention to the results of the tests of significance ... and too little to the estimates of the magnitude of the effects they are investigating" Cochran and Cox - "In many experiments it seems obvious that the different treatments must have produced some difference, however small, in effect. Thus the hypothesis that there is no difference is unrealistic: the real problem is to obtain estimates of the sizes of the differences." Savage - "Null hypotheses of no difference are usually known to be false before the data are collected ... when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science". Kish - "Significance should stand for meaning and refer to substantive matter. ... I would recommend that statisticians discard the phrase 'test of significance ". Kish - "the tests of null hypotheses of zero differences, of no relationships, are frequently weak, perhaps trivial statements of the researcher's aims ... in many cases, instead of the tests of significance it would be more to the point to measure the magnitudes of the relationships, attaching proper statements of their sampling variation. The magnitudes of relationships cannot be measured in terms of levels of significance". Nunnally - "the null-hypothesis models ... share a crippling flaw: in the real world the null hypothesis is almost never true, and it is usually nonsensical to perform an experiment with the sole aim of rejecting the null hypothesis" . Nunnally - "If rejection of the null hypothesis were the real intention in psychological experiments, there usually would be no need to gather data". Yates - "The most commonly occurring weakness ... is ... undue emphasis on tests of significance, and failure to recognise that in many types of experimental work estimates of treatment effects, together with estimates of the errors to which they are subject, are the quantities of primary interest". Yates - "In many experiments ... it is known that the null hypothesis ... is certainly untrue". Cox - "Overemphasis on tests of significance at the expense especially of interval estimation has long been condemned". Kruskal - "it is easy to ... throw out an interesting baby with the nonsignificant bath water. Lack of statistical significance at a conventional level does not mean that no real effect is present; it means only that no real effect is clearly seen from the data. That is why it is of the highest importance to look at power and to compute confidence intervals" Kruskal - "Because of the relative simplicity of its structure, significance testing has been overemphasized in some presentations of statistics, and as a result some students come mistakenly to feel that statistics is little else than significance testing" Best, Manuel On 29/03/2011 06:14 a.m., Ben Bolker wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11-03-29 07:35 AM, Manuel Sp?nola wrote:
I am not a statistician, but what the p-value is telling me? Is not more important the effect size? Best, Manuel
? ?Hmm. ?What's the motivation for your question? ? ?The p-value gives you the probability of the observed pattern, or a more extreme one, having occurred if the null hypothesis were true. ? ?The effect size (defined in various ways) tells you something about the strength of the observed pattern. ? ? Statistical and subject-area (in your case, biological) significance are complementary. A highly statistically significant but biologically trivial effect is a curiosity; a biologically important but statistically insignificant effect means you need more/better data. ? ?I don't know if that answers your question.
On 28/03/2011 04:40 p.m., Ben Bolker wrote:
On 03/28/2011 06:15 PM, John Maindonald wrote:
Elimination of a term with a p-value greater than say 0.15 or 0.2 is however likely to make little differences to estimates of other terms in the model. ?Thus, it may be a reasonable way to proceed. ?For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose.
? ?Note that naive likelihood ratio tests of random effects are likely to be conservative (in the simplest case, true p-values are twice the nominal value) because of boundary issues and those of fixed effects are probably anticonservative because of finite-size effects (see PB 2000 for examples of both cases.)
John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549 Centre for Mathematics& ?Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
? ?Ben
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- *Manuel Sp?nola, Ph.D.* Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o <https://sites.google.com/site/lobitoderio/> Institutional website: ICOMVIS<http://www.icomvis.una.ac.cr/>
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2RzTIACgkQc5UpGjwzenMT3wCfa9orCpx295kTrVJKScLLKnGb HSkAn3Rp5TvrdiUJZjTphkW7biIaqkip =cACS -----END PGP SIGNATURE-----
-- *Manuel Sp?nola, Ph.D.* Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o <https://sites.google.com/site/lobitoderio/> Institutional website: ICOMVIS <http://www.icomvis.una.ac.cr/> ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110329/f8798cb3/attachment.pl>
On 03/29/2011 04:44 PM, Manuel Sp?nola wrote:
Dear Dominick, Thank you for your message. In my opinion, the relationship of theories (and scientific hypothesis) is not so straightforward to hypothesis testing (statistical hypothesis) as many people think, but certainly the p-value is not going to help much on that relationship. If somebody is entertains several possible models why not to: Pr(Model | data) instead of Pr(data | H0)? Best, Manuel
A couple of points:
* p-values certainly have their problems, but despite their problems
they answer a need. Fisher/Neymann/Pearson were pretty smart guys, and
the question that p-values answer ("how likely is it that I would see a
pattern this strong, or stronger, if there were really nothing
happening?") is one that we often want to ask. It's also nice to have a
concise, general statement of the strength of an effect, even if it has
flaws (arguably we could all be quoting log-likelihood differences, or
standardized regression coefficients, instead).
* Notice how often the quotes that you posted below say "overuse", or
"undue", or "too much emphasis" (rather than "never" or "forbidden").
Yes, if I had to choose between a p-value and a confidence interval I
would take the confidence interval every time -- but then I have to
decide what kind of confidence interval I want, and if I decide to use
frequentist confidence intervals I am back in the soup again, both with
interpretation and with the difficulties (in the mixed model context) of
computing them appropriately.
* I wouldn't object if everyone decided to go Bayesian, but that does
have its own cans of worms (deciding on priors, computing [deciding
about convergence if using MCMC], etc.). Again, if I had to choose
between frequentist *only* or Bayesian *only* I would probably choose
Bayesian. The hybrid-Bayesian approaches (e.g. mcmcsamp, post-estimation
MCMC in AD Model Builder) choose flat priors on the (perhaps arbitrarily
chosen) current scale of the parameters, glossing over details that are
sometimes important. (The same goes for the pseudo-Bayesian
interpretation of AIC.)
I agree that the relations among scientific theory and statistical
practices are tough. From Crome 1997:
18. Use statistical procedures from a range of schools and strictly
adhere to their respective methods and interpretation. For example, do a
Fisherian significance test properly and interpret it properly. Then set
up a formal Neymann-Pearson test and interpret it formally (this means
setting up both Type I and II error rates beforehand, among other
things). Then do an estimation procedure. Then switch hats and do a
Bayesian analysis. Take the results of all four, noting their different
behavior, and come to your conclusion. Good analysis and interpretation
are as important as the fieldwork, so allot adequate time and resources
to both. ....
Crome, Francis H. J. 1997. Researching tropical forest fragmentation:
Shall we keep on doing what we?re doing? In Tropical forest remnants:
ecology, management, and conservation of fragmented communities, ed. W.
F Laurance and R. O Bierregard, 485-501. Chicago, IL: University of
Chicago Press.
(There is more here that's worth reading.)
On 29/03/2011 08:51 a.m., Dominick Samperi wrote:
On Tue, Mar 29, 2011 at 8:45 AM, Manuel Sp?nola <mspinola10 at gmail.com> wrote:
Thank you very much Ben. Yes, that answer my question. I didn't have a bad intention but for many non-statisticians I think is confusing why there still so much emphasis on p-values. I know that this will be controversial and I don't have the background to discuss with a statistician but I am confused with the use of p-value in many instances by statisticians. Many well known statisticians have been very critical on the use of p-value as usually are used in statistics. Here there is a link with a list of quotes of many well known statisticians against null hypothesis significance testing (http://warnercnr.colostate.edu/~anderson/nester.html ).
This topic (and this web page) has been discussed at length on
this list recently. Check out the archives.
I like to think of p-values and hypothesis testing as a more scientific
variant of trial by jury, where the theory to be proved ("as charged")
is found guilty by establishing that inconsistent theories (null hypotheses)
are unlikely to be true given the observed data. If the null hypothesis
is true ("beyond a reasonable doubt"), then the theory to be tested "could
not have been at the scene of the crime." Note that just as in a jury
trial, this does not prove that the theory in question is true with
absolute certainty.
In practice one usually entertains several possible models or theories
and selects the one that seems to explain the data best by eliminating
most of the variance in the observations. More precisely, a good model
is one where the residual is negligible and looks like "noise."
Dominick
Some of the quotes: Yates - "the emphasis given to formal tests of significance ... has resulted in ... an undue concentration of effort by mathematical statisticians on investigations of tests of significance applicable to problems which are of little or no practical importance ... and ... it has caused scientific research workers to pay undue attention to the results of the tests of significance ... and too little to the estimates of the magnitude of the effects they are investigating" Cochran and Cox - "In many experiments it seems obvious that the different treatments must have produced some difference, however small, in effect. Thus the hypothesis that there is no difference is unrealistic: the real problem is to obtain estimates of the sizes of the differences." Savage - "Null hypotheses of no difference are usually known to be false before the data are collected ... when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science". Kish - "Significance should stand for meaning and refer to substantive matter. ... I would recommend that statisticians discard the phrase 'test of significance ". Kish - "the tests of null hypotheses of zero differences, of no relationships, are frequently weak, perhaps trivial statements of the researcher's aims ... in many cases, instead of the tests of significance it would be more to the point to measure the magnitudes of the relationships, attaching proper statements of their sampling variation. The magnitudes of relationships cannot be measured in terms of levels of significance". Nunnally - "the null-hypothesis models ... share a crippling flaw: in the real world the null hypothesis is almost never true, and it is usually nonsensical to perform an experiment with the sole aim of rejecting the null hypothesis" . Nunnally - "If rejection of the null hypothesis were the real intention in psychological experiments, there usually would be no need to gather data". Yates - "The most commonly occurring weakness ... is ... undue emphasis on tests of significance, and failure to recognise that in many types of experimental work estimates of treatment effects, together with estimates of the errors to which they are subject, are the quantities of primary interest". Yates - "In many experiments ... it is known that the null hypothesis ... is certainly untrue". Cox - "Overemphasis on tests of significance at the expense especially of interval estimation has long been condemned". Kruskal - "it is easy to ... throw out an interesting baby with the nonsignificant bath water. Lack of statistical significance at a conventional level does not mean that no real effect is present; it means only that no real effect is clearly seen from the data. That is why it is of the highest importance to look at power and to compute confidence intervals" Kruskal - "Because of the relative simplicity of its structure, significance testing has been overemphasized in some presentations of statistics, and as a result some students come mistakenly to feel that statistics is little else than significance testing" Best, Manuel On 29/03/2011 06:14 a.m., Ben Bolker wrote:
On 11-03-29 07:35 AM, Manuel Sp?nola wrote:
I am not a statistician, but what the p-value is telling me? Is not more important the effect size? Best, Manuel
Hmm. What's the motivation for your question?
The p-value gives you the probability of the observed pattern, or a
more extreme one, having occurred if the null hypothesis were true.
The effect size (defined in various ways) tells you something about
the strength of the observed pattern.
Statistical and subject-area (in your case, biological) significance
are complementary. A highly statistically significant but biologically
trivial effect is a curiosity; a biologically important but
statistically insignificant effect means you need more/better data.
I don't know if that answers your question.
On 28/03/2011 04:40 p.m., Ben Bolker wrote:
On 03/28/2011 06:15 PM, John Maindonald wrote:
Elimination of a term with a p-value greater than say 0.15 or 0.2 is however likely to make little differences to estimates of other terms in the model. Thus, it may be a reasonable way to proceed. For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose.
Note that naive likelihood ratio tests of random effects are likely to be conservative (in the simplest case, true p-values are twice the nominal value) because of boundary issues and those of fixed effects are probably anticonservative because of finite-size effects (see PB 2000 for examples of both cases.)
John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics& Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
Ben
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- *Manuel Sp?nola, Ph.D.* Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o <https://sites.google.com/site/lobitoderio/> Institutional website: ICOMVIS<http://www.icomvis.una.ac.cr/>
-- *Manuel Sp?nola, Ph.D.* Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o <https://sites.google.com/site/lobitoderio/> Institutional website: ICOMVIS <http://www.icomvis.una.ac.cr/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- *Manuel Sp?nola, Ph.D.* Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o <https://sites.google.com/site/lobitoderio/> Institutional website: ICOMVIS <http://www.icomvis.una.ac.cr/>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110329/7e4b9f08/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110330/f2bd27c8/attachment.pl>
From: John Maindonald
Yes, the effect size is ultimately more important. But one needs to be somewhat sure that the effect is real, and that it is estimated appropriately. p-values can contribute to a story that gives some smaller or larger confidence that claimed effects are real. They are just one of several routes that contribute to this end. Opinions differ on whether, in any particular circumstance, they are the best route. The discussion that prompted these various comments related to a different use of p-values (and p-value 'alternatives'), one that is even more controversial. It related to the use of p-values in excluding or including model explanatory terms. Here, there are several related issues: 1) Inference for model parameters should take account of the process that has generated the model that is under consideration. This includes any omission of terms that are judged of no statistical consequence. The standard interpretations of p-values apply, strictly, only if there has been no elimination/selection of variables. 2) In models that have certain types of imbalance, parameter estimates can change markedly (even to changing sign), depending on what other terms are in the model. 3) Point 2 argues for choosing the model that is on scientific grounds most reasonable, and sticking with it. If model parameters are important to the subsequent discussion, be sure that their estimates condition on the 'correct' other set of model variables, i.e., that the other variables that are in the model are the ones that are required to allow this interpretation.
I'm afraid that all too often the reason models are chosen on "statistical ground" is the lack of "scientific ground". Sort of a catch 22, I guess... Even when "scientific ground" exists, what exactly constitute one, and how do we know it's not another rabbit (or ozone) hole? Andy
4) One may however allow fine tuning that simplifies the model, while changing nothing of consequence (and it really is necessary to check that there are no changes of consequence). p-values may have a limited use in such fine tuning, but for that purpose the p=0.05 cutoff is not appropriate. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 29/03/2011, at 10:35 PM, Manuel Sp?nola wrote:
I am not a statistician, but what the p-value is telling me? Is not more important the effect size? Best, Manuel On 28/03/2011 04:40 p.m., Ben Bolker wrote:
On 03/28/2011 06:15 PM, John Maindonald wrote:
Elimination of a term with a p-value greater than say
0.15 or 0.2 is
however likely to make little differences to estimates of
other terms
in the model. Thus, it may be a reasonable way to proceed. For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose.
Note that naive likelihood ratio tests of random effects
are likely to
be conservative (in the simplest case, true p-values are twice the nominal value) because of boundary issues and those of
fixed effects are
probably anticonservative because of finite-size effects
(see PB 2000
for examples of both cases.)
John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
Ben
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Manuel Sp?nola, Ph.D. Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o Institutional website: ICOMVIS
[[alternative HTML version deleted]]
Notice: This e-mail message, together with any attachme...{{dropped:11}}
On Tue, Mar 29, 2011 at 7:44 PM, Liaw, Andy <andy_liaw at merck.com> wrote:
From: John Maindonald
Yes, the effect size is ultimately more important. ?But one needs to be somewhat sure that the effect is real, and that it is estimated appropriately. ?p-values can contribute to a story that gives some smaller or larger confidence that claimed effects are real. ?They are just one of several routes that contribute to this end. Opinions differ on whether, in any particular circumstance, ?they are the best route. The discussion that prompted these various comments related to a different use of p-values (and p-value 'alternatives'), one that is even more controversial. ?It related to the use of p-values in excluding or including model explanatory terms. ?Here, there are several related issues: 1) Inference for model parameters should take account of the process that has generated the model that is under consideration. This includes any omission of terms that are judged of no statistical consequence. ?The standard interpretations of p-values apply, strictly, only if there has been no ?elimination/selection of variables. 2) In models that have certain types of imbalance, parameter estimates can change markedly (even to changing sign), depending on what other terms are in the model. 3) Point 2 argues for choosing the model that is on scientific grounds most reasonable, and sticking with it. ?If model parameters are important to the subsequent discussion, be sure that their estimates condition on the 'correct' other set of model variables, i.e., that the other variables that are in the model are the ones that are required to allow this interpretation.
I'm afraid that all too often the reason models are chosen on "statistical ground" is the lack of "scientific ground". ?Sort of a catch 22, I guess... ?Even when "scientific ground" exists, what exactly constitute one, and how do we know it's not another rabbit (or ozone) hole? Andy
Yes, this is particularly so when studying social systems or any rapidly evolving system (like the financial markets). In this situation the statistical picture is often just a snapshot that should probably be labeled (conditioned) by the time of observation and the context. In view of this complexity I'm tempted to view p-values and hypothesis testing (when used in this context) as a communication protocol that helps statisticians to reach a consensus, and not as a tool that reveals timeless truths. Dominick
4) One may however allow fine tuning that simplifies the model, while changing nothing of consequence (and it really is necessary to check that there are no changes of consequence). ?p-values may have a limited use in such fine tuning, but for that purpose the p=0.05 cutoff is not appropriate. John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 29/03/2011, at 10:35 PM, Manuel Sp?nola wrote:
I am not a statistician, but what the p-value is telling me? Is not more important the effect size? Best, Manuel On 28/03/2011 04:40 p.m., Ben Bolker wrote:
On 03/28/2011 06:15 PM, John Maindonald wrote:
Elimination of a term with a p-value greater than say
0.15 or 0.2 is
however likely to make little differences to estimates of
other terms
in the model. ?Thus, it may be a reasonable way to proceed. ?For this purpose, an anti-conservative (smaller than it should be) p-value will usually serve the purpose.
? Note that naive likelihood ratio tests of random effects
are likely to
be conservative (in the simplest case, true p-values are twice the nominal value) because of boundary issues and those of
fixed effects are
probably anticonservative because of finite-size effects
(see PB 2000
for examples of both cases.)
John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
? Ben
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Manuel Sp?nola, Ph.D. Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola at una.ac.cr mspinola10 at gmail.com Tel?fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de r?o Institutional website: ICOMVIS
? ? ? [[alternative HTML version deleted]]
Notice: ?This e-mail message, together with any attachme...{{dropped:11}}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models