Skip to content

Are likelihood approaches frequentist?

9 messages · Paulo Inácio de Kn?==?ISO-8859-1?Q?egt López de Prado, Farrar.David at epamail.epa.gov, Rubén Roa +2 more

Dear r-sig-ecology users

Here follow the messages I exchanged with Ben Bolker last week about the
likelihood and frequentist approaches. We both would like to open this topic
for discussion in the list.

Best wishes

Paulo

----------------------------------------------------------------------------
Dear Dr. Bolker, 

I am puzzled why some authors treat likelihood approaches as 
frequentist, as it seems you did in page 13 of your book 'Ecological Models
and Data'. 
This sounds odd to me because  what brought my attention to likelihood was
Richard Royall's book 'Statistical Evidence'. His framing of a paradigm
based on the likelihood principle, and the clear distinction he makes
between this paradigm and frequentist and Bayesian approaches looks
quite convincing to me.

I agree with him that we use likelihood criteria to identify, among
competing hypotheses, which one attribute the highest probability to  a
given dataset. If I understood correctly, this is what Royal calls the
'evidence value' of a data set to a hypothesis 'vis a vis' other
hypotheses. I also like his idea that the role of statistics in science
is just to gauge this evidence value, no less, no more.

This approach differs from the frequentist because the sampling
space is irrelevant, that is, other datasets that might be observed do not
affect the evidence value of the observed data set. My favourite example is
the comparison of binomial and negative binomial experiments on coin
tossing, in the sections 1.11 and 1.12 of his book.

I am not an "orthodox likelihoodist"; on the contrary, I agree with the
pragmatic view you express in your book. I'd just like to understand
the key differences among the available statistical tools, in order to make
a good pragmatic use of them. I'd really appreciate if you can help me
with this.

Best wishes

Paulo
-----------------------------------------------------------------------------
--
Paulo In?cio de Knegt L?pez de Prado
Depto. de Ecologia - Instituto de Bioci?ncias - USP
#
Paulo In?cio de Knegt L?pez de Prado wrote:
Paulo,
The likelihood function is the central concept of statistical inference, 
so working with the likelihood you can have Bayesian, frequentist 
(better called sampling-distribution inference), or likelihoodist 
inference, depending on what do you do with your likelihoods. In 
Bayesian inference the likelihood function updates prior opinion by 
bringing the data into the inference, in sampling-distribution inference 
(a.k.a. frequentist) it allows the building of better confidence 
intervals by finding in the sample space likelihood values that could 
have occurred if data similar to the data you have had been obtained, 
and in the direct-likelihood approach the likelihood is directly used to 
compare two hypotheses or equivalently to build direct-likelihood 
intervals. For example, the likelihood ratio test (not to be confused 
with the pure likelihood ratio, or differences in support) based on a 
limiting Chi-square distribution is a likelihood-based frequentist 
method. Frequentist statisticians evaluate the likelihood from the 
sample, and then proceed to evaluate the likelihood for other potential 
samples, thus building their confidence intervals and p-values. On the 
other hand Bayesian and likelihoodist statisticians only use the 
likelihood evaluated at the actual sample that was obtained. From that 
point of view one can say that Bayesian and likelihoodist are closer to 
each other than to frequentists, however both Bayesian and frequentists 
base their inference on probabilities (posterior probabilities or error 
rates) whereas likelihoodists base their inference on, well, likelihood 
only.
Royall's points are very convincing indeed, at least they were for me 
too. Royall's concept of evidence in the sample about competing 
hypotheses and on approximate likelihoods for problems with nuisance 
parameters, plus Edwards' mathematical proofs of the properties of the 
support function, plus Jim Lindsey's arguments about Akaike's index in 
model selection, provide a complete theory of statistical inference, 
based exclusively on the likelihood, IMHO.
"There is nothing more practical than a good theory". I'm not sure who 
was the original author of that quote (in a book I read long ago it was 
said that the author was Einstein) but it applies here. Likelihoodist, 
frequentist, and Bayesian inferences are not compatible. Especially 
likelihoodist and Bayesian versus frequentist, so the pragmatst who 
change allegiance is making an error at some point.
Two other great statisticians that subscribe to the likelihoodist school 
of inference are Jim Lindsey
and John Nelder.
At least once a year I hear someone at a meeting say that there are two 
modes of inference:
frequentist and Bayesian. That this sort of nonsense should be so 
regularly propagated shows how
much we have to do. To begin with there is a flourishing school of 
likelihood inference, to which I
belong.
I tend to think there is a place for Bayesian inference in prediction.

Rub?n
#
Rub?n Roa-Ureta wrote:
Two other great statisticians that subscribe to the likelihoodist school 
of inference are Jim Lindsey and John Nelder.
Sorry, the above is a quote from John Nelder, The Statistician, vol. 48, 
issue 2, p. 264.
Rub?n
#
Thanks, Rub?n

My point with this topic was to clarify that the likelihood-based 
approach is a distinct paradigm in statistical inference, and there is 
people in biology applying it successfully.

I agree with you that this point should be better stressed, specially 
for biologists. Taper & Lelle  "The Nature of Scientific Evidence" 
(Chigago Univ Press, 2007) is a great help in this respect.

Could you indicate the best works by Nelder Lindsey that could 
contribute to this point?

Abra?os

Paulo

Rub?n Roa-Ureta escreveu:

  
    
#
prado wrote:
It's a very good point to make. Another important paper is Rubin's paper 
on missing data: Biometrika 63:581-592, 1976. There Rubin basically 
shows that it is easier to make statistical models for data with 
Bayesian and likelihoodist inference, because the mechanism generating 
missing data can be ignored if the missing data is missing at random, 
whereas in sampling distribution inference this conditions is not 
sufficient. To ignore the mechanism generating missing data it is also 
necessary that the observed data be observed at random. Many usual 
scientific studies involve missing data, such s random sampling from 
finite populations, randomized experimental set up, etc.
Thanks for this reference. I've missed it.
In the case of Nelder, I only know of his personal statement in the 
quote that I gave by mistake in my first post, complemented in the 
second post.
Lindsey has a very interesting paper in The Statistician (apart from 
heresies): Relationship between sample size, model selection and 
likelihood regions, and scientifically important differences. The 
Statistician 48:401-411.
Regards
Rub?n
#
As Ben pointed out, the key difference between pure likelihood approaches and
frequentist approaches is the addition of a layer of "significance"
assessment based on the idea of repeated experimentation. (The term
"frequentist" has been stretched in a variety of directions now, perhaps due
to lazy writing, so sometimes it is unclear what's included under the
umbrella.)

In his 2001 book "In All Likelihood: Statistical Modelling and Inference
Using Likelihood", Yudi Pawitan refers to pure likelihood inference as
"Fisher's third way", a compromise between frequentist and Bayesian
approaches that began with Fisher himself. Inference based strictly on the
likelihood function is not probabilistic, so would not conform to either of
these two other paradigms.

In a model selection context, which can be applied in most ecological
situations, one often (always?) does not need "significance" assessment and
can turn instead to model selection criteria for probabilistic statements
about the evidence. Of course, once you plot estimates with confidence
intervals you've entered the "gray area".
prado wrote:

  
    
#
Dave Hewitt wrote:
I think Donald Rubin gave the right term: sampling-distribution 
inference, because it is an inference based on inspection of the sample 
space. Frequentist is not precise because a likelihoodist can subscribe 
to a strictly frequentist view of probabilities (e.g. Edwards) but still 
think that probabilities are not the correct tool for inferential 
statements.
It seems to me that in the area of inference, Fisher had three 
offspring: significance tests/confidence intervals, direct-likelihood 
and fiducial inference. W.r.t. the first child he was a bit embarrassed. 
He wrote in his 1959 book "Objection has sometimes been made that the 
method of calculating Confidence Limits by setting an assigned value 
such as 1% on the frequency of observing [the test statistic] or less  
[...] is unrealistic in treating the values less than [the test 
statistic] which have not been observed , in exactly the same manner as 
the value of the [the test statistic] which is the one that has been 
observed. This feature is indeed not very defensible, save as an 
approximation" (p. 68). His favorite child appeared to be fiducial 
inference, but not many people understood this. It looks like his 
favorite was ignored, while the one he was a bit embarrassed about 
prospered. But we have to see what happens with the other child, 
direct-likelihood, maybe it prevails at the end of the day.
[snip]
Rub?n