Skip to content
Prev 320621 / 398506 Next

normal mixture EM not working?

Dear Stat Tistician;

Have you looked at the original distribution graphically? When I use densityplot on that data I am unable to discern even a hint of two separate peaks. Furthermore the density looks peaked versus what I would expect from a "true" normal distribution with those empiric parameters. So I think the algorithm is attempting to estimate the proportions and variances of two distributions with equal means and different standard deviations. That would seem to be a particularly difficult task for a non-deterministic algorithm as this one clearly is. Under these circumstances I thought you might get better results if you constrained the means to be identically zero.

On the other hand my experiments with mixtools::normalmixEM on the clearly separable peaks in the faithful$waiting vector used in that package's help page also gave widely divergent results on estimated mixing proportions as well (even with arbvar=FALSE), so I am now very suspicious about the stability of either the method or possibly the implementation of that method for data of this size:

data(faithful)
[1] 272
[1] 0.4548029 0.5451971
[1] 0.998016838 0.001983162
number of iterations= 5 
   user  system elapsed 
  0.019   0.001   0.021
[1] 0.3609454 0.6390546

I think I could do a lot better just "drawing a line by eyeball"  between the two peaks:
[1] 171
[1] 101

This makes the observation of implausibly high variation in estimated mixing proportions more reproducible (at least if you are on a Mac):
number of iterations= 8
[1] 0.3608581 0.6391419
number of iterations= 3
[1] 0.08257889 0.91742111

(I'm copying the package maintainer, Derek Young .)
R version 3.0.0 beta (2013-03-22 r62364)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grDevices datasets  splines   graphics  utils     stats     methods   base     

other attached packages:
 [1] mixtools_0.4.6    segmented_0.2-9.4 boot_1.3-9        car_2.0-16        nnet_7.3-6        MASS_7.3-26      
 [7] data.table_1.8.8  reshape2_1.2.2    rms_3.6-3         Hmisc_3.10-1      survival_2.37-4   sos_1.3-5        
[13] brew_1.0-6        lattice_0.20-14  

loaded via a namespace (and not attached):
 [1] cluster_1.14.3     colorspace_1.2-1   dichromat_2.0-0    digest_0.6.3       ggplot2_0.9.3.1   
 [6] grid_3.0.0         gtable_0.1.2       labeling_0.1       munsell_0.4        plyr_1.8          
[11] proto_0.3-10       RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2      tools_3.0.0
On Mar 30, 2013, at 3:20 AM, Stat Tistician wrote:

            
David Winsemius
Alameda, CA, USA