Skip to content

Empty cluster / segfault using vanilla kmeans with version 2.15.2

1 message · Luca Nanetti

#
This example dataset breaks the kmeans in version 2.15.2, installed from
the Belgian CRAN, on an Ubuntu 12.04 LTS 64bit
Day1 Day2 Day3 Day4 Day5 Day6
 [1,]    4    5    5    3    5    5
 [2,]    7    7    6    5    6    6
 [3,]    6    6    5    5    5    5
 [4,]    5    3    4    3    2    4
 [5,]    4    3    2    5    3    2
 [6,]    6    6    6    5    6    6
 [7,]    6    7    6    6    7    6
 [8,]    4    3    5    4    5    5
 [9,]    3    5    5    5    5    6
[10,]    4    5    3    2    4    4
[11,]    7    7    7    5    7    7
[12,]    3    4    2    2    2    2
[13,]    4    6    6    4    6    6
[14,]    5    6    5    6    6    6
[15,]    4    5    5    5    4    3
[16,]    5    6    6    6    6    6
[17,]    7    7    7    6    7    6
[18,]    3    2    3    3    4    2
[19,]    6    5    5    4    5    4
[20,]    5    4    1    5    1    3
[21,]    4    5    5    4    6    5
[22,]    3    4    6    5    6    3
[23,]    2    3    2    3    3    3
[24,]    5    6    5    3    4    5
[25,]    6    6    6    6    6    6
[26,]    5    4    5    5    5    5
[27,]    5    6    6    1    3    6
[28,]    4    4    4    3    3    5
[29,]    6    7    5    5    4    6
[30,]    3    2    2    2    3    2
[31,]    2    4    1    6    4    3
[32,]    4    6    4    5    4    5
[33,]    3    2    2    3    3    3
[34,]    2    3    6    5    4    4
[35,]    2    2    1    1    1    2
[36,]    2    3    2    3    2    3
[37,]    3    6    5    5    3    5
[38,]    7    3    3    7    3    5
[39,]    2    2    4    4    2    4
[40,]    2    4    3    2    3    2

## Define a variable
## Performing kmeans with 100 random starts, several times; for 7 times I
##  get the 'empty cluster' error
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
## The next attempt provokes the segmentation fault. Please note that there
is
##  nothing special with the 7 times reported above; next time it can
happen on
##  the very first time
*** caught segfault ***
address 0x10, cause 'memory not mapped'
Segmentation fault (core dumped)



that's about it ... the attached file has been written with write.table(x,
file=...)

I clustered the same dataset with R 2.14.1, same computer, same OS, using
nstart=1000. And I did it 1000 times. Never had the slightest problem.
Moreover, at the cost of repeating myself, the 'empty cluster' is plausibly
the symptom of a bug, because it _should_ never happen with the
Hartigan-Wong algorithm (default for Kmeans)

Kind regards,
and thanks again for your time.

Luca Nanetti


On Sat, Feb 9, 2013 at 8:52 PM, Uwe Ligges
<ligges at statistik.tu-dortmund.de>wrote:

            
-------------- next part --------------
"Day1" "Day2" "Day3" "Day4" "Day5" "Day6"
"1" 4 5 5 3 5 5
"2" 7 7 6 5 6 6
"3" 6 6 5 5 5 5
"4" 5 3 4 3 2 4
"5" 4 3 2 5 3 2
"6" 6 6 6 5 6 6
"7" 6 7 6 6 7 6
"8" 4 3 5 4 5 5
"9" 3 5 5 5 5 6
"10" 4 5 3 2 4 4
"11" 7 7 7 5 7 7
"12" 3 4 2 2 2 2
"13" 4 6 6 4 6 6
"14" 5 6 5 6 6 6
"15" 4 5 5 5 4 3
"16" 5 6 6 6 6 6
"17" 7 7 7 6 7 6
"18" 3 2 3 3 4 2
"19" 6 5 5 4 5 4
"20" 5 4 1 5 1 3
"21" 4 5 5 4 6 5
"22" 3 4 6 5 6 3
"23" 2 3 2 3 3 3
"24" 5 6 5 3 4 5
"25" 6 6 6 6 6 6
"26" 5 4 5 5 5 5
"27" 5 6 6 1 3 6
"28" 4 4 4 3 3 5
"29" 6 7 5 5 4 6
"30" 3 2 2 2 3 2
"31" 2 4 1 6 4 3
"32" 4 6 4 5 4 5
"33" 3 2 2 3 3 3
"34" 2 3 6 5 4 4
"35" 2 2 1 1 1 2
"36" 2 3 2 3 2 3
"37" 3 6 5 5 3 5
"38" 7 3 3 7 3 5
"39" 2 2 4 4 2 4
"40" 2 4 3 2 3 2