Skip to content
Prev 319294 / 398506 Next

Empty cluster / segfault using vanilla kmeans with version 2.15.2

This example dataset breaks the kmeans in version 2.15.2, installed from
the Belgian CRAN, on an Ubuntu 12.04 LTS 64bit
Day1 Day2 Day3 Day4 Day5 Day6
 [1,]    4    5    5    3    5    5
 [2,]    7    7    6    5    6    6
 [3,]    6    6    5    5    5    5
 [4,]    5    3    4    3    2    4
 [5,]    4    3    2    5    3    2
 [6,]    6    6    6    5    6    6
 [7,]    6    7    6    6    7    6
 [8,]    4    3    5    4    5    5
 [9,]    3    5    5    5    5    6
[10,]    4    5    3    2    4    4
[11,]    7    7    7    5    7    7
[12,]    3    4    2    2    2    2
[13,]    4    6    6    4    6    6
[14,]    5    6    5    6    6    6
[15,]    4    5    5    5    4    3
[16,]    5    6    6    6    6    6
[17,]    7    7    7    6    7    6
[18,]    3    2    3    3    4    2
[19,]    6    5    5    4    5    4
[20,]    5    4    1    5    1    3
[21,]    4    5    5    4    6    5
[22,]    3    4    6    5    6    3
[23,]    2    3    2    3    3    3
[24,]    5    6    5    3    4    5
[25,]    6    6    6    6    6    6
[26,]    5    4    5    5    5    5
[27,]    5    6    6    1    3    6
[28,]    4    4    4    3    3    5
[29,]    6    7    5    5    4    6
[30,]    3    2    2    2    3    2
[31,]    2    4    1    6    4    3
[32,]    4    6    4    5    4    5
[33,]    3    2    2    3    3    3
[34,]    2    3    6    5    4    4
[35,]    2    2    1    1    1    2
[36,]    2    3    2    3    2    3
[37,]    3    6    5    5    3    5
[38,]    7    3    3    7    3    5
[39,]    2    2    4    4    2    4
[40,]    2    4    3    2    3    2

## Define a variable
## Performing kmeans with 100 random starts, several times; for 7 times I
##  get the 'empty cluster' error
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
Error: empty cluster: try a better set of initial centers
## The next attempt provokes the segmentation fault. Please note that there
is
##  nothing special with the 7 times reported above; next time it can
happen on
##  the very first time
*** caught segfault ***
address 0x10, cause 'memory not mapped'
Segmentation fault (core dumped)



that's about it ... the attached file has been written with write.table(x,
file=...)

I clustered the same dataset with R 2.14.1, same computer, same OS, using
nstart=1000. And I did it 1000 times. Never had the slightest problem.
Moreover, at the cost of repeating myself, the 'empty cluster' is plausibly
the symptom of a bug, because it _should_ never happen with the
Hartigan-Wong algorithm (default for Kmeans)

Kind regards,
and thanks again for your time.

Luca Nanetti


On Sat, Feb 9, 2013 at 8:52 PM, Uwe Ligges
<ligges at statistik.tu-dortmund.de>wrote:

            
-------------- next part --------------
"Day1" "Day2" "Day3" "Day4" "Day5" "Day6"
"1" 4 5 5 3 5 5
"2" 7 7 6 5 6 6
"3" 6 6 5 5 5 5
"4" 5 3 4 3 2 4
"5" 4 3 2 5 3 2
"6" 6 6 6 5 6 6
"7" 6 7 6 6 7 6
"8" 4 3 5 4 5 5
"9" 3 5 5 5 5 6
"10" 4 5 3 2 4 4
"11" 7 7 7 5 7 7
"12" 3 4 2 2 2 2
"13" 4 6 6 4 6 6
"14" 5 6 5 6 6 6
"15" 4 5 5 5 4 3
"16" 5 6 6 6 6 6
"17" 7 7 7 6 7 6
"18" 3 2 3 3 4 2
"19" 6 5 5 4 5 4
"20" 5 4 1 5 1 3
"21" 4 5 5 4 6 5
"22" 3 4 6 5 6 3
"23" 2 3 2 3 3 3
"24" 5 6 5 3 4 5
"25" 6 6 6 6 6 6
"26" 5 4 5 5 5 5
"27" 5 6 6 1 3 6
"28" 4 4 4 3 3 5
"29" 6 7 5 5 4 6
"30" 3 2 2 2 3 2
"31" 2 4 1 6 4 3
"32" 4 6 4 5 4 5
"33" 3 2 2 3 3 3
"34" 2 3 6 5 4 4
"35" 2 2 1 1 1 2
"36" 2 3 2 3 2 3
"37" 3 6 5 5 3 5
"38" 7 3 3 7 3 5
"39" 2 2 4 4 2 4
"40" 2 4 3 2 3 2