Skip to content
Prev 354792 / 398500 Next

Compare two normal to one normal

On 23/09/15 13:39, John Sorkin wrote:

            
You are quite correct; there are subtleties involved here.

The one-component model *is* nested in the two-component model, but is 
nested "ambiguously".

(1) The null (single component) model for a mixture distribution is 
ill-defined.  Note that a single component could be achieved either by 
setting the mixing probabilities equal to (1,0) or (0,1) or by setting
mu_1 = mu_2 and sigma_1 = sigma_2.


(2) However you slice it, the parameter values corresponding to the null 
model fall on the *boundary* of the parameter space.

(3) Consequently the asymptotics go to hell in a handcart and the 
likelihood ratio statistic, however you specify the null model, does not 
have an asymptotic chi-squared distribution.

(4) I have a vague idea that there are ways of obtaining a valid 
asymptotic null distribution for the LRT but I am not sufficiently 
knowledgeable to provide any guidance here.

(5) You might be able to gain some insight from delving into the 
literature --- a reasonable place to start would be with "Finite Mixture 
Models" by McLachlan and Peel:

@book{mclachlan2000finite,
   title={Finite Mixture Models, Wiley Series in
          Probability and Statistics},
   author={McLachlan, G and Peel, D},
   year={2000},
   publisher={John Wiley \& Sons, New York}
}

(6) My own approach would be to do "parametric bootstrapping":

* fit (to the real data) the null model and calculate
   the log-likelihood L1, any way you like
* fit the full model and determine the log-likelihood L2
* form the test statistic LRT = 2*(L2 - L1)
* simulate data sets from the fitted parameters for the null model
* for each such simulate data set calculate a test statistic in the
   foregoing manner, obtaining LRT^*_1, ..., LRT^*_N
* the p-value for your test is then

   p = (m+1)/(N+1)

   where m = the number of LRT^*_i values that greater than LRT

The factor of 2 is of course completely unnecessary.  I just put it in 
"by analogy" with the "real", usual, likelihood ratio statistic.

Note that this p-value is *exact* (not an approximation!) --- for any 
value of N --- when interpreted with respect to the "total observation
procedure" of observing both the real and simulated data.  (But see 
below.) That is, the probability, under the null hypothesis, of 
observing a test statistic "as extreme as" what you actually observed is 
*exactly* (m+1)/(N+1).  See e.g.:

@article{Barnard1963,
author = {G. A. Barnard},
title  = {Discussion of ``{T}he spectral analysis of point processes'' 
by {M}. {S}. {B}artlett},
journal = {J. Royal Statist. Soc.},
series  = {B},
volume  = {25},
year = {1963},
pages = {294}
}

or

@article{Hope1968,
author =  {A.C.A. Hope},
title =  {A simplified {M}onte {C}arlo significance test procedure},
journal =  {Journal of the Royal Statistical Society, series {B}},
year =  1968,
volume = 30,
pages = {582--598}
}

Taking N=99 (or 999) is arithmetically convenient.

However I exaggerate when I say that the p-value is exact.  It would be 
exact if you *knew* the parameters of the null model.  Since you have to 
estimate these parameters the test is (a bit?) conservative.  Note that 
the conservatism would be present even if you eschewed the "exact" test 
and an "approximate" test using a (very) large value of N.

Generally conservatism (in this context! :-) ) is deemed to be no bad thing.

cheers,

Rolf Turner

P. S.  I think that the mixing parameter must *always* be estimated. 
I.e. even if you knew mu_1, mu_2, sigma_1 and sigma_2 you would still 
have to estimate "lambda".  So you have 5 parameters in your full model. 
  Not that this is particularly relevant.

R. T.